This article explores the various tools and technologies currently being leveraged (like Hadoop, which is useful for developing applications that can perform absolute statistical analysis on vast quantities of data) and the issues faced when using them (heterogeneity and timeliness, security, incompleteness and scalability of the data are the biggest obstacles when analyzing big data). What are some additional areas where big data utilization can grow? What needs to improve? What other technologies do you envision being used in collaboration with big data in the future, and in what ways?
1. What is big data?
Big Data' is a term used to illustrate huge set of data which is bulk in volume and with respect to time is increasing exponentially. Using traditional data management tools such massive data are complicated to process.
Most organizations face difficulty to create, operate, and administer vast amount of data while dealing with large number of datasets. Big data is mainly trouble in business analytics since, standard tools and procedures are not designed to search and analyze substantial datasets.
Due to internet and social media penetration vast amount of data produced significantly in the past five years. Everydays data generation exceeds 2.5 quadrillion bytes of data.
This vast amount of data accumulates from worldwide info, from the sensors that are used to collect climate information to digital pictures and videos. It also includes posts to social media sites, purchase transaction records, and mobile phone GPS signals. Data is growing with an immense speed than ever before and for every human being on the planet about 1.7 megabytes of new data is being created every second by the year 2020. New data is created every single second.
For example, On google alone every second, if a user performs +40,000 search queries, that will make it 3.5 searches per day and approximately 1.2 trillion number of searches per year.
It is estimated that by the year 2019, big Data will drive $48.6 billion in annual spending and by the year 2020, data production rate will be 44 times larger than it was in the year 2009. More than 70% of the digital universe is created by the individuals. But 80% of big data is stored and managed by enterprises.
It is estimated that hourly collection only by Walmart from its customer transactions is greater than 2.5 terabytes of data. A terabyte is equal to one quadrillion bytes. It is estimated that 1/3rd of all data will be stored, or will have passed via cloud by the year 2020, and a total 45 zettabytes value of data will be created.
When we speak about Big Data, as we have done above, we often identify it as a jargon, which means the enormous volume of data both structured and unstructured that contains so many huge datasets and the traditional database management techniques and associated software techniques cannot process this large amount of data.
Big Data is a concept and a concept can have various interpretations.
Examples of big data:
- About one terabyte of latest business data/day is generated by the New York Stock Exchange.
- According to the statistics, every day more than 500+ terabytes of newly generated data gets absorbed into the databases of the social media sites like Facebook. This huge data is created in conditions of uploading images and videos, exchanging messages, adding of comments etc.
- In 30 minutes of a total flight time a single jet plane engine can create 10+terabytes of data. Thus, creation of data exceeds up to many number of Petabytes.
CHARACTERISTICS OF BIG DATA:
In the year 2001, Gartner's Doug Laney first stated the "three Vs of big data" that described few of the characteristics that make big data unique from new data processing:
- VOLUME: Volume refers to the enormous quantities of data generated every second. Traditional database technology faces difficulty to store and analyze mainly the data sets that is too huge in volume. Most crucial role in determining worth out of data is the role of size of data. A specific data can actually be considered as a big data or not depends on the volume of data. Therefore, while dealing with 'Big Data' we consider 'Volume' as one of the characteristic.
Fig a: Characteristics of Big Data
- VELOCITY: The rate in which the data is formed, accumulated, analyzed and visualized is velocity. Earlier, it was sane to receive an update from the database, every night/even weekly whilst batch processing was common practice. To process the data and modernize the databases large amounts of time was required by computers and servers. In big data period, data is generated in genuine-time or near real- time. With the help of Internet connected devices the wireless or wired, computers and devices can pass on their data the moment they were generated.
The speed at which data flows in from sources like application logs, networks, business processes and Mobile devices, social media sites, sensors, etc., all deals with the velocity of Big Data. The surge of data is massive and coherent.
- VARIETY: Variety appeals to the various type of data we can use now. Earlier days the focus was only on structured data which properly fitted into tabular columns or relational databases, such as financial data. It is estimated that almost 80% of the data in the world is unstructured (i.e., images, voice, text, video, etc.,). Now we can analyze and combine together various data types like social media conversations, messages, sensor data, video, photos, or voice recordings with the help of big data technology.
Earlier, the entire data correctly fitted in rows and columns since the data that was created was structured data, but those days are no more. Currently, the data that is created in an organization is 90% of unstructured data. Today, data comes in various variable formats like semi-structured, structured, unstructured data and also complex structured data. These vast range of data requires a variable or unique approach and also diverse methods to accumulate all raw data.