Methods

Evaluation Criterion

For the supervised methods, students in the test dataset are classified based on the classifier developed based on the training dataset. The performance of supervised learning techniques was evaluated in terms of classification accuracy. Outcome measures include overall accuracy, balanced accuracy, sensitivity, specificity, and Kappa. Since item scores are three categories, 0, 1, and 2, sensitivity, specificity and balanced accuracy were calculated as follows.

\text { Sensitivity }=\frac{\text { True Positives }}{\text { True Positives }+\text { False Negatives, }} (1)

\text { Specificity }=\frac{\text { True Negatives }}{\text { True Negatives }+\text { False Positives }} (2)

\text { Balanced Accuracy }=\frac{\text { Sensitivity }+\text { Specificity }}{2} (3)

where sensitivity measures the ability to predict positive cases, specificity measures the ability to predict negative cases and balanced accuracy is the average of the two. Overall accuracy and Kappa were calculated for each method based on the following formula:

\text { Overall Accuracy }=\frac{\text { True Positives }+\text { True Negatives }}{\text { Total Cases }} (4)
\text { Kappa }=\frac{p_{o}-p_{e}}{1-p_{e}} (5)

where overall accuracy measures the proportion of all correct predictions. Kappa statistic is a measure of concordance for categorical data. In its formula, p_o is the observed proportion of agreement, p_e is the proportion of agreement expected by chance. The larger these five statistics are, the better classification decisions.

For the two unsupervised learning methods, the better fitting method and the number of clusters were determined for the training dataset by the following criteria:

1. Davies-Bouldin Index calculated as in Equation 6, can be applied to compare the performance of multiple clustering algorithms. The algorithm with the lower DBI is considered the better fitting one which has the higher between-cluster variance and smaller within-cluster variance.

\mathrm{DBI}=\frac{1}{k} \sum_{i=1}^{k} \max _{i \neq j} \frac{S_{i}+S_{j}}{M_{i j}} (6)

where k is the number of clusters, S_i and S_j are the average distances from the cluster center to each case in cluster i and cluster j. M_{ij} is the distance between the centers of cluster i and cluster j. Cluster j has the smallest between-cluster distance with cluster i or has the highest within-cluster variance, or both.

2. Kappa value (see Equation 5) is a measure of classification consistency between these two unsupervised algorithms. It is usually expected not smaller than 0.8.

To check the classification stability and consistency in the training dataset, the methods were repeated in the test dataset, DBI and Kappa values were computed.