Discovering Interesting and Actionable Knowledge

For Diagnostic Data Mining

- A Deployed Technology

A Romanian Translation of this Page

In many data mining applications, the input data has a target attribute, which represents the objective of the application tasks. In data mining and machine learning, this attribute is called the class attribute. There are two main types of data mining involving such kind of data:

  1. Predictive data mining: The objective is to build predictive or classification models that can be used to classify future cases. The first type of data mining has been the focus of study of almost the entire machine learning community. Most commercial data mining tools are for this purpose too.
  2. Diagnostic data mining: The aim is to understand the data and/or to find causes of problems and actionable knowledge in order to solve the problems. This type of data mining is crucial for engineering, manufacturing and scientific applications because engineers and scientists want to gain new knowledge, rather than just being given a prediction.
The second task has been studied by data mining researchers under the topic of interestingness, i.e., helping the user find actionable knowledge quickly from a large number of discovered patterns. However, through applications we found that existing machine learning techniques are not really suitable to the second task, while existing interestingness methods are not sufficient either. We had to build a different kind of data mining system for Motorola as commercial data mining systems do not fit their needs. The reason is that they cannot perform the second task well, while the second task is the key problem of the entire engineering/manufacturing industry, where product improvement is the essence. Data mining is needed to identify product problems and their causes (no prediction is needed).

We were also surprised to find that current rule mining paradigm (classification rule mining or association rule mining) itself poses a major obstacle to interestingness analysis, which makes it difficult for finding actionable knowledge. We discovered that

Due to shortcomings of the existing methods, we have designed a set of new techniques to perform the second task. They are based on two novel ideas, rule cubes and general impressions. Our system, called Opportunity Map, has been deployed and is in daily use in Motorola. The first version of the system was deployed around Dec 2005. As of June 15 2006, it was used to analyze 11 large data sets for entirely different applications from Motorola's world wide operations, e.g., finding causes of call performance problems and solving them, supply chain analyses, network characterization, business opportunities, etc. Some applications involve continuous analyses as new data comes in. Many pieces of discovered knowledge have already been used in Motorola's new products. The system is also commercially available. Some of the techniques have been discussed in the following papers. The first paper is more reserach oriented and the second paper focuses more on the Motorola application.

Publications

Older Papers on Interestingness in Data Mining

Created on Aug 4, 2006 by Bing Liu.