4. Conclusions

As a limitation, we know that we were not able to collect all the existent research about the topic, due to the vast number of synonyms and constraints in the major search terms. However, the high quantity of primary studies - a set of 1376 papers was obtained - provides a trustworthy and complete data set.

The SLR study on Big Data modeling and management conducted in this paper has identified 36 relevant studies. These works have been selected based on the inclusion and selection process of research papers from digital scientific libraries until August 2019; there are no published papers prior to 2010.

We have raised three research questions that were answered through bibliometric analysis, an SLR and a discussion. The results of the bibliometric study provide a lot of relevant information. For instance, from 2015 onwards, the number of studies has increased significantly as this was the year when many countries started to contribute to the topic of interest. The leading countries in this topic are the USA and China. The USA have one of the most impactful studies.

The results from the SLR and from the Discussion also reveal some very interesting facts. For instance, more than 50% of the studies do not verify their proposals with real-world datasets. Big Data's velocity characteristic is not justified by 90% of the studies. ER is the most used model at the conceptual abstraction level, document-oriented is the most researched model at the logical abstraction level and MongoDB DBMS is the most frequent implementation at the physical level. Moreover, as the main gaps, we identified the lack of proposal evaluations and a few studies focused on query-driven methodology and hybrid database solutions.

The contribution of this SLR study, by clarifying the knowledge on the specific topic of Big Data modeling, can support researchers and practitioners in improving their Big Data projects. The relevant works collected can be a useful starting point to new studies into Big Data modeling.

Finally, we know that, due to the need to analyze large volumes of data with a variety of structures, which arrive in high frequency, database research became more focused towards NoSQL. However, NoSQL DBMSs have not been able to acquire some strengths that are already present in relational databases, such as the ability to support Consistency and Availability at the same time; for this reason, NewSQL has emerged and may pose a solution to the problems faced by both DBMSs. As a future work, we also expect to study this new database system.

We hope to keep the SLR up to date and present results for other concepts. Moreover, based on the gaps, we will drive our research in these aspects.