10. The Knowledge Discovery
PROCESS
The CRISP-DM KDP model consists of six steps, which are summarized below:
-
Business understanding. This step focuses on the understanding of objectives and requirements from a business perspective. It also converts these into a DM problem definition, and designs a preliminary project plan to achieve the objectives. It is further broken into several substeps, namely,
-
determination of business objectives,
-
assessment of the situation,
-
determination of DM goals, and
-
generation of a project plan.
-
-
Data understanding. This step starts with initial data collection and familiarization with the data. Specific aims include identification of data quality problems, initial insights into the data, and detection of interesting data subsets. Data understanding is further broken down into collection of initial data,
-
description of data,
-
exploration of data, and
-
verification of data quality.
-
-
Data preparation. This step covers all activities needed to construct the final dataset, which constitutes the data that will be fed into DM tool(s) in the next step. It includes Table, record, and attribute selection; data cleaning; construction of new attributes; and transformation of data. It is divided into
-
selection of data,
-
cleansing of data,
-
BusinessUnderstanding
-
DataUnderstanding
-
Data Preparation
-
ModelingEvaluation
-
DeploymentData
-
construction of data,
-
integration of data, and
-
formatting of data substeps.
-
-
Modeling. At this point, various modeling techniques are selected and applied. Modeling usually involves the use of several methods for the same DM problem type and the calibration of their parameters to optimal values. Since some methods may require a specific format for input data, often reiteration into the previous step is necessary. This step is subdivided into selection of modeling technique(s),
-
generation of test design,
-
creation of models, and
-
assessment of generated models.
-
Evaluation. After one or more models have been built that have high quality from a data analysis perspective, the model is evaluated from a business objective perspective. A review of the steps executed to construct the model is also performed. A key objective is to determine whether any important business issues have not been sufficiently considered. At the end of this phase, a decision about the use of the DM results should be reached. The key substeps in this step include
-
evaluation of the results,
-
process review, and
-
determination of the next step.
-
-
-
Deployment. Now the discovered knowledge must be organized and presented in a way that the customer can use. Depending on the requirements, this step can be as simple as generating a report or as complex as implementing a repeatable KDP. This step is further divided into plan deployment,
-
plan monitoring and maintenance,
-
generation of final report, and
-
review of the process substeps.