Read this article and pay attention to the data mining techniques, classifier development, and evaluation criteria. Then take notes and understand the difference between supervised and unsupervised learning models. Finally, read the summary and discussion section of this article. What distinctions can be made about the three major purposes of problem-solving items using data-mining techniques?
There are different types of data warehouses, and each has a specific purpose within an organization. Remember, it is important to use the correct type of warehouse to support the "decision support" model being employed. Decision support techniques such as classification, prediction, time-series analysis, association, clustering, and so on will each have their own unique data needs. Correctly designing the data warehouse will ensure the best possible evidence to support strategic and daily decisions.
Managing data is an important function in the administrative process. Because organizations use data to guide decisions, decision-makers rely on you to produce a data management plan for sustainability, growth, and strategy. As you start to interact with decision-makers and the decision-support systems they use, you will also find that additional study of the models employed through a course on quantitative methods or decision-support technology will prove useful.
Methods
Data Mining Techniques
This study demonstrates how to utilize
data mining techniques to map the selected features (both action and
time) to students' item performance on this problem-solving item in 2012
PISA. Given students' item scores are available in the data file,
supervised learning algorithms can be trained to help classify students
based on their known item performance (i.e., score category) in the
training dataset while unsupervised learning algorithms categorize
students into groups based on input variables without knowing their item
performance. No assumptions about the data distribution are made on
these data mining techniques.
Four supervised learning methods:
Classification and Regression Tree (CART), gradient boosting, random
forest, and SVM are explored to develop classifiers while, two
unsupervised learning methods, Self-organizing Map (SOM) and k-means,
are utilized to further examine different strategies used by students in
both the same and different score categories. CART was chosen because
it worked effectively in a previous study and
is known for its quick computation and simple interpretation. However,
it might not have the optimal performance compared with other methods.
Furthermore, small changes in the data can change the tree structure
dramatically. Thus, gradient boosting and random forest,
which can improve the performance of trees via ensemble methods, were
also used for comparison. Though SVM has not been used much in the
analysis of process data yet, it has been applied as one of the most
popular and flexible supervised learning techniques for other
psychometric analysis such as automatic scoring. The two
clustering algorithms, SOM and k-means, have been applied in the
analysis of process data in log files. Researchers have suggested to use more than one
clustering methods to validate the clustering solutions. All the analyses were conducted in the software program Rstudio.