Read this article and pay attention to the data mining techniques, classifier development, and evaluation criteria. Then take notes and understand the difference between supervised and unsupervised learning models. Finally, read the summary and discussion section of this article. What distinctions can be made about the three major purposes of problem-solving items using data-mining techniques?
There are different types of data warehouses, and each has a specific purpose within an organization. Remember, it is important to use the correct type of warehouse to support the "decision support" model being employed. Decision support techniques such as classification, prediction, time-series analysis, association, clustering, and so on will each have their own unique data needs. Correctly designing the data warehouse will ensure the best possible evidence to support strategic and daily decisions.
Managing data is an important function in the administrative process. Because organizations use data to guide decisions, decision-makers rely on you to produce a data management plan for sustainability, growth, and strategy. As you start to interact with decision-makers and the decision-support systems they use, you will also find that additional study of the models employed through a course on quantitative methods or decision-support technology will prove useful.
Methods
Feature Generation and Selection
Feature Generation
Features
generated can be categorized into time features and action features, as
summarized in Table 1. Four Time features were created: T_time, A_time,
S_time, and E_time, indicating total response time, action time spent
in process, starting time spent on first action, and ending time spent
on last action, respectively. It was assumed that students with
different ability levels may differ in the time they read the question
(starting time spent on first action), the time they spent during the
response (action time spent in process), and the time they used to make
final decision (ending time spent on last action). Different researchers
have proposed various joint modeling approaches for both response
accuracy and response times, which explain the relationship between the
two. Thus, the
total response times are expected to differ as well.
However, in
this study, action features were created by coding different lengths of
adjacent action sequences together. Thus, this study generated 12 action
features consisting of only one action (unigrams), 18 action features
containing two ordered adjacent actions (bigrams), and 2 action features
created from four sequential actions (four-grams). Further, all action
sequences generated were assumed to have equal importance and no weights
were assigned to each action sequence. In Table 1, "concession" is a
unigram, consisting of only one action, that is, the student bought the
concession fare; on the other hand, "S_city" is a bigram, consisting of
two actions, which are "Start" and "city subway," representing the
student selected the city subway ticket after starting the item.
Sao
Pedro et al. showed that features generated should be
theoretically important to the construct to achieve better
interpretability and efficiency. Following their suggestion, features
were generated as the indicators of the problem-solving ability measured
by this item, which is supported by the scoring rubric. For example,
one action sequence consisted of four actions, which was coded as
"city_con_daily_cancel," is crucial to scoring. If the student first
chose "city_subway" to tour the city, then used the student's concession
fare ("concession"), looked at the price of daily pass ("daily") next
and lastly, he/she clicked "Cancel" to see the other option, this action
sequence is necessary but not sufficient for a full credit.
The
final recoded dataset for analysis is made up of 426 students as rows
and 36 features (including 32 action sequence features and 4 time
features) as columns. Scores for each student served as known labels
when applying supervised learning methods. The frequency of each
generated action feature was calculated for each student.
Feature Selection
The
selection of features should base on both theoretical framework and the
algorithms used. As features were generated from a purely theoretical
perspective in this study, no such consideration is needed in feature
selection.
Two other issues that need consideration are redundant
variables and variables with little variance. Tree-based methods handle
these two issues well and have built-in mechanisms for feature
selection. The feature importance indicated by tree-based methods are
shown in Figure 3. In both random forest and gradient boosting, the most
important one is "city_con_daily_cancel." The next important one is
"other_buy," which means the student did not choose trip_4 before the
action "Buy." The feature importance indicated by tree-based methods is
especially helpful when selection has to be made among hundreds of
features. It can help to narrow down the number of features to track,
analyze, and interpret. The classification accuracy of the support
vector machine (SVM) is reduced due to redundant variables. However,
given the number of features (36) is relatively small in the current
study, deleting highly correlated variables (ρ≥ 0.8) did not improve
classification accuracy for SVM.
Figure 3. Feature importance indicated by tree-based methods.

Clustering
algorithms are affected by variables with near zero variance. Fossey and Kerr et al. discarded variables with 5 or fewer
attempts in their studies. However, their data were binary and no
clear-cut criterion exists for feature elimination when using cluster
algorithms in the analysis of process data. In the current study, 5
features with variance no >0.09 in both training and test dataset
were removed to achieve optimal classification results.
In
summary, a full set of features (36) were retained in the tree-based
methods and SVM while 31 features were selected for SOM and k-means
after the deletion of features with little variance.