Read this article and pay attention to the data mining techniques, classifier development, and evaluation criteria. Then take notes and understand the difference between supervised and unsupervised learning models. Finally, read the summary and discussion section of this article. What distinctions can be made about the three major purposes of problem-solving items using data-mining techniques?
There are different types of data warehouses, and each has a specific purpose within an organization. Remember, it is important to use the correct type of warehouse to support the "decision support" model being employed. Decision support techniques such as classification, prediction, time-series analysis, association, clustering, and so on will each have their own unique data needs. Correctly designing the data warehouse will ensure the best possible evidence to support strategic and daily decisions.
Managing data is an important function in the administrative process. Because organizations use data to guide decisions, decision-makers rely on you to produce a data management plan for sustainability, growth, and strategy. As you start to interact with decision-makers and the decision-support systems they use, you will also find that additional study of the models employed through a course on quantitative methods or decision-support technology will prove useful.
Results
The tuning and training results for the four supervised learning techniques are first reported and then the evaluation of their performance on the test datasets. Lastly, the results for the unsupervised learning methods are presented.
Supervised Learning Methods
The tuning processes for all the classifiers reached satisfactory results. For the CART, cp was set to 0.02 to achieve minimum error and the simplest tree structure (error < 0.2, number of trees < 6), as shown in Figure 4. The final tuning parameters for gradient boosting: the number of trees = 250, the depth of trees = 10, the learning rate = 0.01 and the minimum number of observations in the trees terminal nodes = 10. Figure 5 shows that when the maximum tree depth equaled 10, the RMSE was minimum as iteration reached 250 with the simplest tree structure. The number of predictors sampled for splitting at each node (mtry) in the random forest was set to 4 to achieve the largest accuracy, as shown in Figure 6. In the SVM, the scale function σ was set to 1 and the cost value C set to 4 to reach the smallest training error 0.038.
Figure 4. The CART tuning results for cost-complexity parameter (cp).

Figure 5. The Gradient Boosting tuning results.

Figure 6. The random forest tuning results (peak point corresponds to mtry = 4).

The performance of the four supervised techniques was summarized in Table 2. All four methods performed satisfactorily, with almost all values larger than 0.90. The gradient boosting showed the best classification accuracy overall, exhibiting the highest Kappa and overall accuracy (Kappa = 0.94, overall accuracy = 0.96). Most of their subclass specificity and balanced accuracy values also ranked top, with only sensitivity for score = 0, specificity for score = 1 and balanced accuracy for score = 0 smaller than those from SVM. SVM, random forest, and CART performed similarly well, all with a slightly smaller Kappa and overall accuracy values (Kappa = 0.92, overall accuracy = 0.95).
Table 2. Average of accuracy measures of the scores.

Among the four supervised methods, the single tree structure from CART built from the training dataset is the easiest to interpret and plotted in Figure 7. Three colors represent three score categories: red (no credit), gray (partial credit), and green (full credit). The darker the color is, the more confident the predicted score is in that node, the more precise the classification is. In each node, we can see three lines of numbers. The first line indicates the main score category in that node. The second line represents the proportions of each score category, in the order of scores of 0, 1, and 2. The third line is the percentage of students falling into that node. CART has a built-in characteristic to automatically choose useful features. As shown in Figure 7, only five nodes (features), "city_con_daily_cancel," "other_buy," "trip4_buy," "concession," and "daily_buy," were used in branching before the final stage. In each branch, if the student performs the action (>0.5), he/she is classified to the right, otherwise, to the left. As a result, students with a full credit were branched into one class, in which 96% truly belonged to this class and accounted for 29% of the total data points. Students who earned a partial credit were partitioned into two classes, one purely consisted of students in this group and the other consisted of 98% students who truly got partial credit. For the no credit group, students were classified into three classes, one purely consisted of students in this group and the other two classes included 10 and 18% students from other categories. One major benefit from this plot is that we can clearly tell the specific action sequences that led students into each class.
Figure 7. The CART classification.
Unsupervised Learning Methods
As shown in Table 3, the
candidates for the best clustering solution from the training dataset
were k-means with 5 clusters (DBI = 0.19, kappa = 0.84) and SOM with 9
clusters (DBI = 0.25, kappa = 0.96), which satisfied the criterion of a
smaller DBI value and kappa value ≥ 0.8. When validated with the test
dataset, the DBI values for k-means and SOM all increased. It could be
caused by the smaller sample size of the test dataset. Due to the low
kappa value for the 5-cluster solution in the validation sample, the
final decision on the clustering solution was SOM with 9 clusters. The
percentage of students in each score category in each cluster is
presented in Figure 8.
Table 3. Clustering Algorithms' Fit (DBI) and Agreement (Cohen's Kappa).

Figure 8. Percentage in each score category in the final SOM clustering solution with 9 clusters from the training dataset.

To
interpret, label and group the resulting clusters, it is necessary to
examine and generalize the students' features and the strategy pattern
in each of the cluster. In alignment with the scoring rubrics and ease
of interpretation, the nine clusters identified in the training dataset
are grouped into five classes and interpreted as follows.
1. Incorrect (cluster1): students bought neither individual tickets for 4 trips nor a daily ticket.
2.
Partially correct (cluster 4–5): students bought either individual
tickets for 4 trips or a daily ticket but did not compare the prices.
3.
Correct (cluster 7 and 8): students did compare the prices between
individual tickets and a daily ticket and chose to buy the cheaper one
(individual tickets for 4 trips).
4. Unnecessary actions (cluster
2, 3, and 6): students tried options not required by the question,
e.g., country train ticket, other number of individual ticket.
5. Outlier (cluster 9): the student made too many attempts and is identified as an outlier.
Such
grouping and labeling can help researchers better understand the common
strategies used by students in each score category. It also helps to
identify errors students made and can be a good source of feedback to
students. For those students mislabeled above, they share the major
characteristics in the cluster. For example, 4% students who got no
credit in cluster 4 in the training dataset bought daily ticket for the
city subway without comparing the prices, but they bought the full fare
instead of using student's concession fare. These students are different
from those in cluster 1 who bought neither daily tickets nor individual
tickets for 4 trips. Thus, students in the same score category were
classified into different clusters, indicating that they made different
errors or took different actions during the problem-solving process. In
summary, though students in the same score category generally share the
actions they took, they can also follow distinct problem-solving
processes. Students in different score categories can also share similar
problem-solving process.