Discussion

This paper has discussed how uncertainty can impact big data, both in terms of analytics and the dataset itself. Our aim was to discuss the state of the art with respect to big data analytics techniques, how uncertainty can negatively impact such techniques, and examine the open issues that remain. For each common technique, we have summarized relevant research to aid others in this community when developing their own techniques. We have discussed the issues surrounding the five V's of big data, however many other V's exist. In terms of existing research, much focus has been provided on volume, variety, velocity, and veracity of data, with less available work in value (e.g., data related to corporate interests and decision making in specific domains).


Future research directions

This paper has uncovered many avenues for future work in this field. First, additional study must be performed on the interactions between each big data characteristic, as they do not exist separately but naturally interact in the real world. Second, the scalability and efficacy of existing analytics techniques being applied to big data must be empirically examined. Third, new techniques and algorithms must be developed in ML and NLP to handle the real-time needs for decisions made based on enormous amounts of data. Fourth, more work is necessary on how to efficiently model uncertainty in ML and NLP, as well as how to represent uncertainty resulting from big data analytics. Fifth, since the CI algorithms are able to find an approximate solution within a reasonable time, they have been used to tackle ML problems and uncertainty challenges in data analytics and process in recent years. However, there is a lack of CI metaheuristics algorithms to apply to big data analytics for mitigating uncertainty.