DDDM helps management make better-informed decisions, but when the data is streaming in from a number of sources, it can complicate the process as management attempts to match and keep up with the velocity of the incoming data. Read this article to explore the challenges of making decisions with streaming data and the adaptation needed in the decision-making framework to continue making informed decisions.
Streamed Big Data
This section discusses the characteristics of big streaming data and the challenges of learning under big streaming data.
Characteristics of streamed big data
Big data is an outcome of the current information explosion that is relevant to a diverse range of fields in the natural, life, social, and applied science, including physics, biology, medicine, economics and management . Big data has been widely characterized by the three Vs: a hugely increased Volume of data, a Variety of data sources and quality, and the high Velocity at which data is generated or obtained. Big data technology holds incredible promise for improving people's lives, accelerating scientific discovery and innovation, and instigating positive societal change. Meanwhile, new challenges accompanying the heterogeneity, incompleteness, scale, timeliness, privacy and process complexity of big data, including aspects of data acquisition, data storage, information extraction, and big data analysis, need to be overcome. Further three Vs are now recognized as the development of big data analysis: Veracity, which focuses on the unreliability inherent in data sources; Variability, which refers to variations in data flow rates; and Value, which refers to the issue of low value density.
Challenges in streamed big data
Eight big streaming data challenges were discussed in, covering the cycle of knowledge discovery from data. We consider these challenges from three aspects: (1) the development of new data mining skills for big streaming data; (2) the development of simpler, self-adaptive machine learning algorithms; and (3) the requirements of privacy and confidentiality for gaining trust of the users and society in the system.
As data evolves over time, the validity and reliability of the historical data are questionable. Decision support for big streaming data has to consider these issues to perform accurate, up-to-date, real-time analysis. For example, the detection of highway flooding. Although data streams, online learning, big data, and adaptation to concept drift have become important research topics during the last decade, a truly autonomous, self-maintaining, adaptive data mining system is still lacking. The short lifespan of data restricts us to storing and accessing all historical data during each processing cycle; however, processing accuracy has been strictly limited by the fact that the data can be accessed only once (one-pass setting). This is critical when concept drift occurs, because good and bad data samples are treated equally when they are used to learn a new concept. Computing resources such as hardware and storage space have been developed to be more efficient and effective, therefore, it is more practical to adopt a limited storage assumption rather than a zero-storage assumption when discussing decision support for high-volume streaming data. In addition, previous decisions are no longer applicable when data evolve and have to be replaced according to the current situation. Therefore, when to make a decision change and how to conduct that change are two unsolved aspects of this problem, which become more difficult when multiple streams are involved.