Read this article. What are the main challenges of using big data?
Background
Volume of Big Data
The volume of Big Data is typically large. However, it does not require a certain amount of petabytes. The increase in the volume of various data records is typically managed by purchasing additional online storage; however, the relative value of each data point decreases in proportion to aspects such as age, type, quantity, and richness. Thus, such expenditure is unreasonable. The following two subsections detail the volume of Big Data in relation to the rapid growth of data and the development rate of hard disk drives (HDDs). It also examines Big Data in the current environment of enterprises and technologies.
Rapid Growth of Data
The data type that increases most rapidly is unstructured data. This data type is characterized by "human information" such as high-definition videos, movies, photos, scientific simulations, financial transactions, phone records, genomic datasets, seismic images, geospatial maps, e-mail, tweets, Facebook data, call-center conversations, mobile phone calls, website clicks, documents, sensor data, telemetry, medical records and images, climatology and weather records, log files, and text. According to Computer World, unstructured information may account for more than 70% to 80% of all data in organizations. These data, which mostly originate from social media, constitute 80% of the data worldwide and account for 90% of Big Data. Currently, 84% of IT managers process unstructured data, and this percentage is expected to drop by 44% in the near future. Most unstructured data are not modeled, are random, and are difficult to analyze. For many organizations, appropriate strategies must be developed to manage such data. Table 1 describes the rapid production of data in various organizations further.
Source | Production |
YouTube | (i) Users upload 100 hours of new videos per minute (ii) Each month, more than 1 billion unique users access YouTube (iii) Over 6 billion hours of video are watched each month, which corresponds to almost an hour for every person on Earth. This figure is 50% higher than that generated in the previous year |
(i) Every minute, 34,722 Likes are registered (ii) 100 terabytes (TB) of data are uploaded daily (iii) Currently, the site has 1.4 billion users (iv) The site has been translated into 70 languages |
|
(i) The site has over 645 million users (ii) The site generates 175 million tweets per day |
|
Foursquare | (i) This site is used by 45 million people worldwide (ii) This site gets over 5 billion check-ins per day (iii) Every minute, 571 new websites are launched |
Google+ | 1 billion accounts have been created |
The site gets over 2 million search queries per minute Every day, 25 petabytes (PB) are processed |
|
Apple | Approximately 47,000 applications are downloaded per minute |
Brands | More than 34,000 Likes are registered per minute |
Tumblr | Blog owners publish 27,000 new posts per minute |
Users share 40 million photos per day | |
Flickr | Users upload 3,125 new photos per minute |
2.1 million groups have been created | |
WordPress | Bloggers publish near 350 new blogs per minute |
Table 1 Rapid growth of unstructured data.
Development Rate of Hard Disk Drives (HDDs)
The demand for digital storage is highly elastic. It cannot be completely met and is controlled only by budgets and management capability and capacity. Goda et al. and [K. Goda and M. Kitsuregawa] discuss the history of storage devices, starting with magnetic tapes and disks and optical, solid-state, and electromechanical devices. Prior to the digital revolution, information was predominantly stored in analogue videotapes according to the available bits. As of 2007, however, most data are stored in HDDs (52%), followed by optical storage (28%) and digital tapes (roughly 11%). Paper-based storage has dwindled 0.33% in 1986 to 0.007% in 2007, although its capacity has steadily increased (from 8.7 optimally compressed PB to 19.4 optimally compressed PB). Figure 2 depicts the rapid development of HDDs worldwide.
Figure 2 Worldwide shipment of HDDs from 1976 to 2013.
The HDD is the main component in electromechanical devices. In 2013, the expected revenue from global HDDs shipments was $33 billion, which was down 12% from the predicted $37.8 billion in 2012. Furthermore, data regarding the quantity of units shipped between 1976 and 1998 was obtained from Datasheetcatalog.com, 1995; Mandelli and Bossi, 2002; MoHPC, 2003; Helsingin Sanomat, 2000; Belk, 2007; and J. Woerner, 2010; those shipped between 1999 and 2004 were provided by Freescale Semiconductors 2005; PortalPlayer, 2005; NVIDIA, 2009; and Jeff, 1997; those shipped in 2005 and 2006 were obtained from Securities and Exchange Commission, 1998; those shipped in 2007 were provided by [Morgan Stanley–S. Ethier]; and those shipped from 2009 to 2013 were obtained from [Coughlin es]. Based on the information gathered above, the quantity of HDDs shipped will exceed 1 billion annually by 2016 given a progression rate of 14% from 2014 to 2016. As presented in Figure 2, the quantities of HDDs shipped per year were
in 1976, 1980, 1990, 2000, and 2012, respectively. According to Coughlin Associates, HDDs expenditures are expected to increase by 169% from 2011 to 2016, thus affecting the current enterprise environment significantly. Given this finding, the following section discusses the role of Big Data in the current enterprise environment.