Predictive Analytics and Consumer Loyalty

Using big data to target brand success and build equity has become valuable. Review the results of this predictive analysis research and assess how the loyalty rules were derived from this model. Was the classification of the consumers predictive or reflective?

Research tools

Hortonworks data platform (HDP)

An open-source framework for distributed storage and processing of large and multi-source datasets. HDP enables flexible application deployment, machine learning, deep learning workloads, real-time data storage, security, and governance. It is a key element in the modern data structure of data (Fig. 1).

Fig. 1

Fig. 1

Hortonworks data platform: HDP 3.1

The HDP framework was custom- installed to obtain only the tools and systems required to track all stages of this work. these tools and systems were: a distributed file system, Hadoop HDFS for data storage, Spark implementation engine for data processing, Yarn for resource management, Zeppelin as a development user interface,Ambari for system monitoring, Ranger for system security and (Flume System and Scoop) for data acquisition from Syriatel company data sources to HDFS in our dedicated framework.

Hive is an ETL and data warehouse tool on top of the Hadoop ecosystem and used for processing structured and semi-structured data. Hive is a database present in the Hadoop ecosystem that performs DDL and DML operations, and it provides flexible query language such as HQL for better querying. Hive in Map reduce mode is used because data was distributed across multiple data, nodes to execute queries with better performance in a parallel way. The hardware resources were used included 12 nodes with 32 GB of RAM, 10 TB storage capacity, and 16-core processor per node. The Spark engine was used in most phases of model building such as data processing, feature engineering, training and model testing because it is able to save its data in compute engine's memory (RAM) and also perform data processing over this data stored in memory, thus eliminating the need for a continuous Input/Output (I/O) of writing/reading data from disk. In addition, there are many other advantages. One of these advantages is that this engine contains a variety of libraries to implement all stages of the machine learning life cycle.