Data Mining Techniques in Analyzing Process
Methods
Evaluation Criterion
For the supervised methods, students in the test dataset are classified based on the classifier developed based on the training dataset. The performance of supervised learning techniques was evaluated in terms of classification accuracy. Outcome measures include overall accuracy, balanced accuracy, sensitivity, specificity, and Kappa. Since item scores are three categories, 0, 1, and 2, sensitivity, specificity and balanced accuracy were calculated as follows.
where sensitivity measures the ability to predict positive cases, specificity measures the ability to predict negative cases and balanced accuracy is the average of the two. Overall accuracy and Kappa were calculated for each method based on the following formula:
(4)
(5)
where
overall accuracy measures the proportion of all correct predictions.
Kappa statistic is a measure of concordance for categorical data. In its
formula, is the observed proportion of agreement,
is the
proportion of agreement expected by chance. The larger these five
statistics are, the better classification decisions.
For the two
unsupervised learning methods, the better fitting method and the number
of clusters were determined for the training dataset by the following
criteria:
1. Davies-Bouldin Index
calculated as in Equation 6, can be applied to compare the performance
of multiple clustering algorithms. The algorithm with the
lower DBI is considered the better fitting one which has the higher
between-cluster variance and smaller within-cluster variance.
(6)
where
is the number of clusters,
and
are the average distances from
the cluster center to each case in cluster
and cluster
.
is the
distance between the centers of cluster
and cluster
. Cluster
has
the smallest between-cluster distance with cluster
or has the highest
within-cluster variance, or both.
2.
Kappa value (see Equation 5) is a measure of classification consistency
between these two unsupervised algorithms. It is usually expected not
smaller than 0.8.
To check the
classification stability and consistency in the training dataset, the
methods were repeated in the test dataset, DBI and Kappa values were
computed.