Discovering Interesting and Actionable Knowledge
For Diagnostic Data Mining
A Romanian Translation of this Page
In many data mining applications, the input data has a target attribute,
which represents the objective of the application tasks. In data mining
and machine learning, this attribute is called the class attribute. There are
two main types of data mining involving such kind of data:
The second task has been studied by data mining researchers under
the topic of interestingness, i.e., helping the user find actionable knowledge
quickly from a large number of discovered patterns. However, through
applications we found that existing machine learning techniques are not
really suitable to the second task, while existing interestingness
methods are not sufficient either. We had to build a different
kind of data mining system for Motorola as commercial data mining
systems do not fit their needs. The reason is that they cannot perform
the second task well, while the second task is the key problem of the
entire engineering/manufacturing industry, where product improvement
is the essence. Data mining is needed to identify product problems and
their causes (no prediction is needed).
- Predictive data mining: The objective is to build predictive
or classification models that can be used to classify future cases.
The first type of data mining has been the focus of study of
almost the entire machine learning community. Most commercial
data mining tools are for this purpose too.
- Diagnostic data mining: The aim is to understand the data
and/or to find causes of problems and actionable knowledge in order
to solve the problems. This type of data mining is crucial for
engineering, manufacturing and scientific applications because engineers
and scientists want to gain new knowledge, rather than just being given a prediction.
We were also surprised to find that current rule mining paradigm
(classification rule mining or association rule mining) itself poses a
major obstacle to interestingness analysis, which makes it difficult
for finding actionable knowledge. We discovered that
- Each individual rule is not a piece of knowledge by itself. It can only be a piece of interesting knowledge in an implicit or explicit context.
- Individual rules may not be interesting, but a group of them may
represent an important piece of knowledge. Meta-mining is thus needed to mine more generalized knowledge, e.g., trends. An old idea called "general impressions" that we proposed in KDD-07 turned out to be so useful.
Due to shortcomings of the existing
methods, we have designed a set of new techniques to perform the second
task. They are based on two novel ideas, rule cubes and general impressions. Our system, called Opportunity Map, has been deployed and is in daily use in Motorola. The first version of the system was deployed around Dec 2005. As of June 15 2006, it was used to analyze 11 large data sets for entirely different applications
from Motorola's world wide operations, e.g., finding causes of call
performance problems and solving them, supply chain analyses, network characterization, business
opportunities, etc. Some applications involve continuous analyses as
new data comes in. Many pieces of discovered
knowledge have already been used in Motorola's new products. The system is also
commercially available. Some of the techniques have been discussed in the following papers.
The first paper is more reserach oriented and the second paper focuses more
on the Motorola application.
Older Papers on Interestingness in Data Mining
- Bing Liu, Kaidi Zhao, Jeffrey Benkler and Weimin Xiao. "Rule Interestingness Analysis Using OLAP Operations." Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
(KDD-2006, full paper), August 20 - 23, 2006, Philadelphia, USA. [ready soon].
- Kaidi Zhao, Bing Liu, Jeffrey Benkler and Weimin Xiao. "Opportunity Map: Identifying Causes of Failure - A Deployed Data Mining System."
Proceedings of the Twelfth ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD-2006, industrial track full
paper), August 20 - 23, 2006, Philadelphia, USA. [ready soon].
Created on Aug 4, 2006 by Bing Liu.