Methods

Data Description

The PISA 2012 log file dataset for the problem-solving item was downloaded at http://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm. The dataset consists of 4722 actions from 426 students as rows and 11 variables as columns. Eleven variables (see Figure 2) include: cnt indicates country, which is USA in the present study; schoolid and StIDStd indicate the unique school and student IDs, respectively; event_number (ranging from 1 to 47) indicates the cumulative number of actions the student took; event_value (see raw event_values presented in Table 1) tells the specific action the student took at one time stamp and time indicates the exact time stamp (in seconds) corresponding to the event_value. Event notifies the nature of the action (start item, end item, or actions in process). Lastly, network, fare_type, ticket_type, and number_trips all describe the current choice the student had made. The variables used were schoolid, StIDStd, event_value and time. ID variables helped to identify students, while event_value and time variables were used to generate features. The scores for all students were not provided in the log file, thus, hand coded and carefully double checked based on the scoring rule. Among the 426 students, 121 (28.4%) got full credit, 224 (52.6%) got partial credit and 81 (19.0%) did not get any credit. Full, partial, and no credit were coded as 2, 1, and 0, respectively.

Figure 2. The screenshot of the log file for one student.




Table 1.
15 raw event values and 36 generated features.