Predictive Analytics and Consumer Loyalty

Using big data to target brand success and build equity has become valuable. Review the results of this predictive analysis research and assess how the loyalty rules were derived from this model. Was the classification of the consumers predictive or reflective?

Implement methodology

The intended model was described to determine customer loyalty. This model relied on data mining and customer value analysis methods based on TFM to improve customer relationship management. Customers were divided using the Calculating TFM Score.

After the division of customers and determining the degree of loyalty in each department. The classification was performed based on descriptors expressing the level of loyalty in each segment and a set of behavioral and demographic features using several algorithms and evaluated their models to obtain the best model in terms of the highest accuracy. Steps required for the proposed model illustrated in Fig 9.

Fig. 9

Steps required for the proposed model

The loyalty features of these models were extracted to find out the causes of loyalty and the precise targeting of these segments.

Data preparation

  • The data was collected from Syriatel data sources to the Hadoop environment.

  • The Scala language was chosen to perform data preparation, attribute extraction, model training, and testing because it is the language in which the distributed implementation engine (spark) has developed. spark achieves a high-speed execution in addition to stability. In Spark, the library of automated learning provided by the Spark implementation engine, ML extension is used.

  • The original data was divided into two sub-parts: the training group and the test group by 70/30, respectively.

Address missing and text values

Filling in the missing values with values of either zero or average of several nearby values was an advantage so that it enabled us to use the information in most attributes for the exercise. In this research, the following were applied:

  • The attributes whose 70% at least include a missing value were deleted.

  • The missing numerical values were replaced with the mean attribute itself.

  • StringIndexer has been applied to String style attributes to convert them into numbers. This represents a common way to produce labels using StringIndexer, train a model with these pointers, and retrieve the original labels from the predicted pointers column using IndexToString. However, you are free to provide your own labels. Emphasis was placed on the development of the attribute preparation and attribute selection process.

Data processing and application of extraction and selection of attributes

  • The T, F, M features were calculated for each customer, and the behavioral features were chosen.

  • The most important attributes were chosen based on Chi-Squared function. this function was applied to groups of categorical characteristics and selected features to assess the probability of correlation or correlation between them using frequency distribution.

Compilation using calculating TFM

To calculate the TFM results, the three input parameters were divided into five subcategories. The Time, frequency and monetary scores were calculated and then combined, to obtain a comprehensive TFM analysis score.

Cumulative total duration (cumulative total duration) T

Time (T): total of calls and Internet sessions duration in a certain period of time.

The customers have been divided into different categories (Table 5). The cumulative total duration of service, for example, the total of calls and Internet sessions duration in 3 months (T).

Table 5 Assessment (cumulative total duration) T






Greater ...

Some research defines it as the average time you spend communicating or using application services in one month and in our study for three months. When the user's connection time is greater than the previously calculated average, the value of T is 2; otherwise, it is 1. In our study, There were 5 levels which were calculated after the maximum cumulative value of the total duration of calls and usage per GSM and the smallest value was calculated. Calculations were made and 5 levels of values T resulted. Where T1 was the first category with the lowest value, T2 was the second category, T3 was the third category, T4 was the category Fourth was high value, T5 Class V was very high.


Frequency (F): use services frequently within a certain period

Customers were divided into different frequency categories (Table 6). Totalize the number of times he/she performed with the company (communication, message, internet access) during the past 3 months.

F1 represents clients who have performed less than or equal to 2 transactions in the last 3 months and F5 represents customers who have performed more than or equal to 11 transactions in the last 3 months.

Table 6 Frequency






Greater 2
Greater 11

In our study, there were 5 levels calculated after calculating the maximum cumulative value of the total number of calls and the number of uses per GSM and the smallest value was calculated. Calculations were made and 5 levels of values F resulted.


Monetary (M): The money spent during a certain period. Customers were divided into different cash categories (Table 7) according to the total amount he/she paid for transactions with the organization over the past 3 months. M1 as clients paid less than or equal to 100 transactions in the last 3 months M5 customers who have paid greater than or equal to 10,000 transactions in the last 3 months.

Table 7 Monetary






Greater 100
50,000 - 10,000
Greater 10,000

In our study, There were 5 levels calculated after calculating the maximum cumulative value of the total cash mass of calls and the number of times of use per GSM and the smallest value was calculated. Calculations were made and 5 levels of values M resulted.

Based on the TF results calculated above (Table 8), we calculate the TFM results (Table 9).

Table 8 Assessment criteria for TF score (time–frequency)


















































Table 9 Assessment criteria for TFM score

Segment and target customers

Customer categories

Customer segmentation involves splitting the customer base into different subsets. A specific subsets with the same interest and spending habits. Based on the TFM results as calculated above, customers can be divided into five parts:

  • Very high value customers (greater loyalty) These are the customers who make the highest profit for the operator. Without them the operator will lose its market share and competitive advantage. These customers are given appropriate care and attention from the operator.

  • High value customers (great loyalty) These are the customers who make the highest profit for the operator. Without them, the operator will lose its market share and competitive advantage. These customers are given appropriate care and attention from the operator.

  • Medium value customers (average loyalty) These are the customers who make medium profitability.

  • Low value customers (little loyalty) These are the clients who make very little profit.

  • Customer churn from the company (very little loyalty).

Customers who have the least loyalty are those who have left the company or are about to leave. Endeavors were taken to prevent them from leaving, and if they leave the company, the cost of customer service will be calculated. The total cost (1) associated with these potential customers if they stop their relationship Total cost (customer leakage) = lost revenue Marketing cost (1) Lost revenue is the revenue that these customers can make if they do not cease their relationship with the operator. The cost of marketing is the cost associated with replacing these customers with new customers.

Target customers

By calculating the TFM score, individually the status of the total time spent on calls, sms and the total Internet data, high-value customers were recognized as well as potential customers to leave the company. Today, however, most people have access to a range of telecommunications operator services that include both the price of the connection and the amount of Internet data. The sudden exceptions to this were people who used only the operator's services for the Internet or calls and not both. Therefore, to target customers based on these things, considerations about both the TFM score for the total call time, SMS and the amount of Internet data were taken (Tables 10, 11).

Table 10 TFM score for the total calls duration and the total amount of Internet data/SMS with (high, low) loyalty


Internet data/sms


Total Time of call







In both TFM score for total call time and number of messages.

Table 11 TFM score for the total call durations and the total amount of Internet data/sms with multi-level loyalty

Internet data/sms





Total time of call












Very low

If TFM is High, then Customers who use large Internet data and spend a lot of time on calls and send a large number of messages. Average customers who use Internet data, spend time on calls and average messages. If TFM is low, then Customers use less Internet data and spend less on calls and fewer messages. Large users can be targeted using loyalty points and personalized offers tailored to them, as they are the key to the competitive operator advantage in the marketplace, for medium users who can make offers based on a combination of both call rates and data bundle and for low user operators can deliver Offers tend to make these users use more of the services provided by operators. for example, free local calls to the same telephone numbers of the operator late at night, etc. For Potential Customers churn, TFM points were integrated with unstructured data such as social media data and call center feedback data to accurately predict them. Customers who were likely to churn must be taken seriously into consideration by telecom operators because of the cost of revenue and marketing associated with each. Therefore, the telecom operator must be made offers such as free talk time, free packet data for a specified period and an additional number of messages, for example, 200 MB of data for 3 days, to retain it.