Date: Oct. 12, 2000

Dear Editor, 

Thank you very much for tentatively accepting our paper for publication 
in IEEE Intelligent Systems. We have revised our paper according to the 
reviewers' comments. 

Our replies to the comments from the reviewers and the modifications made 
to the paper are attached with this letter. They are interspersed among 
the reviewers' comments, starting with a ">>>".

   If anything else needs to be modified, please let us know. 


Thank you very much

Yours truly,

Bing Liu

*************************************************************************

           REVIEWER COMMENTS:

REVIEWER A
Section I.  Overview
                                                                              
 A   Reader Interest                                                          
                                                                              
  1  Which category describes this  Research/Technology manuscript?     
                                                         

2 How relevant is this manuscript to  Very Relevant 
    the readers of this periodical?  Please explain your rating.    
                          
                                                              
No comments
|-------------------------|
|B  Content               |
|-------------------------|

  1  Please summarize what you view as the key point(s) of the manuscript 
and the importance of this content to the readers  of this periodical.  
                                         

A comprehensive system for mining, describing, and visualizing
"interesting" association rules.
                                                                   
  2  Is the manuscript technically sound? Please explain your answer. Yes
                                              
                                                                   
|-------------------------|
|C.   Presentation        |
|-------------------------|
                                                                        
  1  Are the title, abstract, and keywords appropriate?   Yes   
          Please comment.                           
                                

No comments
                                                                        
  2  Does the manuscript contain sufficient and appropriate references? 
     References are sufficient and appropriate   Please comment.        
                                                                        
No comments
                                                                        
  3  Does the introduction state the objectives of the manuscript in    
     terms that encourage the reader to read on?   Yes  Please explain  
     your answer.                                                       

No comments
                                                                        
  4  How would you rate the organization of the manuscript?  Is it      
     focused?  Is the length appropriate for the topic?  Satisfactory   
     Please comment.                                                    

No comments
                                                                        
  5  Please rate and comment on the readability of this manuscript.     
     Readable - but requires some effort to understand                  

   Some of the sentences are a bit terse. Fixing this would improve the
   flow, but even with this problem the paper is understandable.

>>> We have read through the paper, and re-written some hard-to-understand 
    sentences.   

Section II.  Summary and Recommendation
|-------------------------|
|A.  Evaluation           |
|-------------------------|

Please rate the manuscript. Excellent Explain your choice.
                           
 B.  Recommendation        
                           
|----------------------------------------------------------------|
| (*) Accept with no changes                                     |
| ( ) Accept if certain minor revisions are made                 |
| ( ) Author should prepare a major revision for a second review |
| ( ) Reject                                                     |
|----------------------------------------------------------------|
Please make your recommendation Accept with no changes and explain your  
decision.

   The authors did a fine job of addressing my problems with the previous 
draft. The paper is now quite sound and (quite importantly) much more 
coherent and readable. This really is a different paper altogether, and 
a much better one!

Section III.  Detailed Comments
                                                  
 A.  Public Comments                              

I think the intro could use a bit more cleanup, but nothing major:

>>> Have done it. Thanks. 

"This is particularly true for association rule mining [e.g., 2, 3, 7, 14, 
27], which often produce a huge number of rules."
In the above sentence, "produce" should be "produces".

>>> We have corrected it. 

In association rule mining, such measures are support and confidence 
[2, 27, 3].

I would say "such measures *include* support and confidence", since 
clearly there are many association rule miners which use additional or 
alternative measures.

>>> Thanks. We agree "include" is a more suitable word here. We have 
    changed it in the paper.


"Actionability is partially handled through unexpectedness because 
actionable rules are either expected or unexpected. Thus, the proposed 
technique aims to find expected and unexpected association rules. 
Expected rules are also called conforming rules... "

The above prose is a bit confusing. First you say you are going to focus 
on "unexpectedness", which implies you will only be finding unexpected 
rules. But then you say that expected rules may be actionable and hence 
are interesting too... which is it? 

>>> Thanks for pointing out the confusion. We have clarified it in the 
    revised version. Basically, "unexpected rules" and "actionable rules" 
    are not mutually exclusive. Interesting rules can be classified into 
    three categories: 
	Category 1: 	rules that are both unexpected and actionable, 
	Category 2: 	rules that are unexpected but not actionable, and 
	Category 3: 	rules that are actionable but expected. 
         We have incorporated this into the paper. 

==================================
REVIEWER B

Section I.  Overview
                                                                              
 A   Reader Interest                                                          
                                                                              
1  Which category describes this  Research/Technology                       
     manuscript?                                                              
                                                                              
2  How relevant is this manuscript to  Relevant             
     the readers of this periodical?                          
     Please explain your rating.                              

No comments
|-------------------------|
|B  Content               |
|-------------------------|

1  Please summarize what you view as the key point(s) of the     
     manuscript and the importance of this content to the readers  
     of this periodical.                                           

The paper describes a method of finding user specified "interesting" 
or unexpected rules from a pre-mined collection of association rules. 
The main contributions include a language to specify what's interesting 
and a visualization component.
                                                                   
  2  Is the manuscript technically sound? Please explain your      
     answer.      Yes                                              

|-------------------------|
|C.   Presentation        |
|-------------------------|

1  Are the title, abstract, and keywords appropriate?   Yes   Please  
     comment.                                                           

No comments
                                                                        
  2  Does the manuscript contain sufficient and appropriate references? 
     References are sufficient and appropriate   Please comment.        

No comments
                                                                        
  3  Does the introduction state the objectives of the manuscript in    
     terms that encourage the reader to read on?   Yes  Please explain  
     your answer.                                                       

No comments
                                                                        
  4  How would you rate the organization of the manuscript?  Is it      
     focused?  Is the length appropriate for the topic?  Satisfactory   
     Please comment.                                                    

No comments
                                                                        
  5  Please rate and comment on the readability of this manuscript.     
     Easy to read                                                       

   No comments

Section II.  Summary and Recommendation
|-------------------------|
|A.  Evaluation           |
|-------------------------|

Please rate the manuscript. Good   Explain your choice.
                           
 B.  Recommendation        

|----------------------------------------------------------------|
| ( ) Accept with no changes                                     |
| (*) Accept if certain minor revisions are made                 |
| ( ) Author should prepare a major revision for a second review |
| ( ) Reject                                                     |
|----------------------------------------------------------------|
Please make your recommendation Accept if certain minor revisions are made
and explain your decision.

Section III.  Detailed Comments
                                                  
 A.  Public Comments                              

The main contribution of this paper is the IAS specification language and 
the visualization component. 

First, how easy is it for a user to specify beliefs or GIs, RPCs and PKs. 
Assuming that the goal is to do data exploration, then typically the user 
will not have any clue about what's interesting and what's not. Won't 
browsing rules by specifying GIs, which essentially amount to rule templates, 
be as painstaking a process in a general setting. It can however help if 
one has some background knowledge.

>>> The proposed technique aims to help users who do have background 
    knowledge. It is not suitable for the users who know little about 
    the domain. The graphic user-interface of our system also makes the 
    input of GIs and RPCs less difficult. Our KDD-99 and KDD-2000 papers 
    try to help those who perform data exploration, where summarization 
    and proper organization of rules are used to reduce the user's effort 
    in browsing the discovered rules. The two approaches are complementary. 

Second, I don't see why these GIs, RPCs and PKs have to be applied in the 
post-processing step. Why not integrate them directly into the mining 
method, which will be a more efficient approach if it can be done.

>>> There are basically two main approaches for finding subjectively 
    interesting rules. 
    1.  incorporate the user's existing knowledge in the mining algorithm 
        so as to discover only the interesting rules during the rule 
        generation phase, or 
    2.	post-analyze the discovered rules using the user's existing 
	knowledge to help the user identify the interesting 
	rules (the mining algorithm does not consider the user's knowledge). 
    The advantages and disadvantages of these approaches are discussed below. 
    We look at the first approach first.  
    *  	The advantage of the first approach is that it focuses the search 
	of the mining algorithm on only the interesting rules. It avoids 
	generating many non-interesting rules. 
    *	However, it suffers from the problem of knowledge acquisition. 
	That is, the user may know a great deal about the domain, but is 
	unable to tell what he/she knows completely. The difficulty of 
	knowledge acquisition is well documented in Expert Systems 
	literature in AI. Typically, the user needs to interact with 
	the system many times in order to provide a more complete set of 
	existing knowledge. This incremental and iterative process is, 
	however, not suitable for this approach as it is not efficient 
	to execute a mining algorithm whenever the user remembers another 
	piece of knowledge because a mining algorithm is typically 
	computationally intensive. 
    We now look at the second approach. 
    *   The mining algorithm is run only once to discover all the rules. 
	An efficient post-analysis system then helps the user to 
	incrementally analyze the discovered rules to identify the 
	interesting ones. The interactive and iterative process is the 
	key to allow the user to supply a more and more complete set of 
	existing knowledge, and to find more interesting rules. 
    *	The disadvantage is that if the user knows exactly what he/she wants, 
	producing all the rules will be wasteful. 

    As for which method should be used in an application, it depends on the 
    nature of the application. The first approach is ideal if the user is 
    absolutely sure what types of rules are interesting to him/her (and 
    nothing else). However, if the user does not have any specific rules 
    in mind and wishes to find whatever rules that are interesting, the 
    second approach will be more appropriate. In some applications, an 
    integrated approach may be preferred because there is some knowledge 
    that can be used by the mining algorithm to generate the relevant rules 
    (which can still be many). After that, post-analysis is used to help the 
    user identify the truly interesting ones.
      We have discussed these issues in the related work section of the paper. 

Third, I have misgivings about the both-sides unexpected rules case. 
Won't you find too many rules with both sides unexpected, since these 
are all the rules that do not match the specification given by the user. 
So if I have a million rules and a GI that only touches a small fraction, 
will all the other rules go into the both sides unexpected case?

>>> It is true that if the user has only one specific GI, too many rules 
    will appear as both side unexpected rules. The both-side-unexpected 
    ranking is however useful in situations where the user has specified 
    many GIs or RPCs, or the GIs (or RPCs) have covered a big portion of 
    the discovered rules. This ranking serves to remind the user what 
    he/she might have forgotten. It should also be noted that a GI or a 
    RPC can cover a large faction of the rules due to the use of class 
    hierarchy. But of course, we do not claim that this ranking is always 
    very useful. But it can be useful at times. In fact, it is one of our 
    users who suggested this ranking as it helps him to remember some 
    existing knowledge.   

The time complexity on page 11 is given as O(NUA). Isn't A exponential in 
the itemset length. In the evaluation section, you only used a small 
dataset with only about a thousand rules. What about real very large 
databases? Do you have any results on those. In general one can get 
millions of association rules. 

>>> |A| is the number of discovered rules. It is true that the number of 
    rules discovered by an association rule miner can be huge (grow 
    exponentially). That is the problem with the association rule mining 
    model. However, in practice, it is often only used for sparse data 
    (such as supermarket transactions) and with high minimum support and 
    minimum confidence thresholds. So, the number of rules generated 
    is usually manageable. 
	As the reviewer pointed out below on redundant rules, actually 
    most of the discovered association rules are indeed redundant. We 
    have studied the problem of removing redundant rules through pruning 
    in one of our KDD-99 papers. [ZakiKDD00] also discusses the problem. 
    After pruning, the number of rules left is much smaller. 
        We have not had a practical experience with millions of rules. 
    In general, this is possible if the minimum support is set very low. 
    However, rules with too low supports may not be useful in practice
    as they are usually insignificant. 

Finally, the paper doesn't mention anything about the redundancy of the 
unexpected association rules, i.e., is it possible to further prune the 
set of rules displayed by removing the redundant ones (e.g. Zaki, KDD00).

>>> We have considered this issue. The problem of redundant rules is a 
    problem of objective interestingness. Our KDD-99 paper "Pruning and 
    Summarizing the Discovered Associations" addresses this problem. 
    Pruning is used to remove those redundant and/or insignificant rules. 
    We have discussed this issue in the newly revised version, and 
    included the reference of [Zaki, KDD00]. 

All in all, the paper does reflect an interesting "user-centric" approach 
to finding unexpected rules (or even conforming ones) from a large set of 
discovered rules. 

>>> Thanks.