Big Data Analytics for Disparate Data

There are some common issues when dealing with big data. Two critical ones are data quality and data variety (such as multiple formats within the dataset) – deep learning techniques, such as dimension reduction, can be used to solve these problems. Traditional data models and machine learning methods struggle to deal with these data issues, further supporting the case of deep learning, as the former cannot handle complex data with the framework of Big Data.

Using the 7Vs characteristics of big data, assign an issue you recognize in your industry to each and think of possible solutions. Additionally, write down the positives and if some of these key points could be added elsewhere in your industry in a different manner.

Abstract

Disparate data is heterogeneous data with variety in data formats, diverse data dimensionality, and low quality. Missing values, inconsistencies, ambiguous records, noise, and high data redundancy contribute to the 'low quality' of disparate data. It is a challenge to integrate disparate data from various sources. Big data is often disparate, dynamic, untrustworthy, and inter-related. Big Data analytics can be used to analyze correlation between factors and detect patterns or uncover unknown trends in disparate data. This paper introduces quality problems of disparate data. Some methods and technology progress regarding Big Data analytics for disparate data are presented. Challenges of Big Data analytics in dealing with disparate data are also discussed in this paper.

Keywords: Big Data, Big Data Analytics, Disparate Data, Intelligent Information Systems, Deep Learning, Machine Learning, Data Mining, Artificial Intelligence, Knowledge Discovery


Source: Lidong Wang and Randy Jones, http://article.sapub.org/10.5923.j.ajis.20170702.01.html
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.