Date: Oct. 12, 2000 Dear Editor, Thank you very much for tentatively accepting our paper for publication in IEEE Intelligent Systems. We have revised our paper according to the reviewers' comments. Our replies to the comments from the reviewers and the modifications made to the paper are attached with this letter. They are interspersed among the reviewers' comments, starting with a ">>>". If anything else needs to be modified, please let us know. Thank you very much Yours truly, Bing Liu ************************************************************************* REVIEWER COMMENTS: REVIEWER A Section I. Overview A Reader Interest 1 Which category describes this Research/Technology manuscript? 2 How relevant is this manuscript to Very Relevant the readers of this periodical? Please explain your rating. No comments |-------------------------| |B Content | |-------------------------| 1 Please summarize what you view as the key point(s) of the manuscript and the importance of this content to the readers of this periodical. A comprehensive system for mining, describing, and visualizing "interesting" association rules. 2 Is the manuscript technically sound? Please explain your answer. Yes |-------------------------| |C. Presentation | |-------------------------| 1 Are the title, abstract, and keywords appropriate? Yes Please comment. No comments 2 Does the manuscript contain sufficient and appropriate references? References are sufficient and appropriate Please comment. No comments 3 Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Yes Please explain your answer. No comments 4 How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Satisfactory Please comment. No comments 5 Please rate and comment on the readability of this manuscript. Readable - but requires some effort to understand Some of the sentences are a bit terse. Fixing this would improve the flow, but even with this problem the paper is understandable. >>> We have read through the paper, and re-written some hard-to-understand sentences. Section II. Summary and Recommendation |-------------------------| |A. Evaluation | |-------------------------| Please rate the manuscript. Excellent Explain your choice. B. Recommendation |----------------------------------------------------------------| | (*) Accept with no changes | | ( ) Accept if certain minor revisions are made | | ( ) Author should prepare a major revision for a second review | | ( ) Reject | |----------------------------------------------------------------| Please make your recommendation Accept with no changes and explain your decision. The authors did a fine job of addressing my problems with the previous draft. The paper is now quite sound and (quite importantly) much more coherent and readable. This really is a different paper altogether, and a much better one! Section III. Detailed Comments A. Public Comments I think the intro could use a bit more cleanup, but nothing major: >>> Have done it. Thanks. "This is particularly true for association rule mining [e.g., 2, 3, 7, 14, 27], which often produce a huge number of rules." In the above sentence, "produce" should be "produces". >>> We have corrected it. In association rule mining, such measures are support and confidence [2, 27, 3]. I would say "such measures *include* support and confidence", since clearly there are many association rule miners which use additional or alternative measures. >>> Thanks. We agree "include" is a more suitable word here. We have changed it in the paper. "Actionability is partially handled through unexpectedness because actionable rules are either expected or unexpected. Thus, the proposed technique aims to find expected and unexpected association rules. Expected rules are also called conforming rules... " The above prose is a bit confusing. First you say you are going to focus on "unexpectedness", which implies you will only be finding unexpected rules. But then you say that expected rules may be actionable and hence are interesting too... which is it? >>> Thanks for pointing out the confusion. We have clarified it in the revised version. Basically, "unexpected rules" and "actionable rules" are not mutually exclusive. Interesting rules can be classified into three categories: Category 1: rules that are both unexpected and actionable, Category 2: rules that are unexpected but not actionable, and Category 3: rules that are actionable but expected. We have incorporated this into the paper. ================================== REVIEWER B Section I. Overview A Reader Interest 1 Which category describes this Research/Technology manuscript? 2 How relevant is this manuscript to Relevant the readers of this periodical? Please explain your rating. No comments |-------------------------| |B Content | |-------------------------| 1 Please summarize what you view as the key point(s) of the manuscript and the importance of this content to the readers of this periodical. The paper describes a method of finding user specified "interesting" or unexpected rules from a pre-mined collection of association rules. The main contributions include a language to specify what's interesting and a visualization component. 2 Is the manuscript technically sound? Please explain your answer. Yes |-------------------------| |C. Presentation | |-------------------------| 1 Are the title, abstract, and keywords appropriate? Yes Please comment. No comments 2 Does the manuscript contain sufficient and appropriate references? References are sufficient and appropriate Please comment. No comments 3 Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Yes Please explain your answer. No comments 4 How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Satisfactory Please comment. No comments 5 Please rate and comment on the readability of this manuscript. Easy to read No comments Section II. Summary and Recommendation |-------------------------| |A. Evaluation | |-------------------------| Please rate the manuscript. Good Explain your choice. B. Recommendation |----------------------------------------------------------------| | ( ) Accept with no changes | | (*) Accept if certain minor revisions are made | | ( ) Author should prepare a major revision for a second review | | ( ) Reject | |----------------------------------------------------------------| Please make your recommendation Accept if certain minor revisions are made and explain your decision. Section III. Detailed Comments A. Public Comments The main contribution of this paper is the IAS specification language and the visualization component. First, how easy is it for a user to specify beliefs or GIs, RPCs and PKs. Assuming that the goal is to do data exploration, then typically the user will not have any clue about what's interesting and what's not. Won't browsing rules by specifying GIs, which essentially amount to rule templates, be as painstaking a process in a general setting. It can however help if one has some background knowledge. >>> The proposed technique aims to help users who do have background knowledge. It is not suitable for the users who know little about the domain. The graphic user-interface of our system also makes the input of GIs and RPCs less difficult. Our KDD-99 and KDD-2000 papers try to help those who perform data exploration, where summarization and proper organization of rules are used to reduce the user's effort in browsing the discovered rules. The two approaches are complementary. Second, I don't see why these GIs, RPCs and PKs have to be applied in the post-processing step. Why not integrate them directly into the mining method, which will be a more efficient approach if it can be done. >>> There are basically two main approaches for finding subjectively interesting rules. 1. incorporate the user's existing knowledge in the mining algorithm so as to discover only the interesting rules during the rule generation phase, or 2. post-analyze the discovered rules using the user's existing knowledge to help the user identify the interesting rules (the mining algorithm does not consider the user's knowledge). The advantages and disadvantages of these approaches are discussed below. We look at the first approach first. * The advantage of the first approach is that it focuses the search of the mining algorithm on only the interesting rules. It avoids generating many non-interesting rules. * However, it suffers from the problem of knowledge acquisition. That is, the user may know a great deal about the domain, but is unable to tell what he/she knows completely. The difficulty of knowledge acquisition is well documented in Expert Systems literature in AI. Typically, the user needs to interact with the system many times in order to provide a more complete set of existing knowledge. This incremental and iterative process is, however, not suitable for this approach as it is not efficient to execute a mining algorithm whenever the user remembers another piece of knowledge because a mining algorithm is typically computationally intensive. We now look at the second approach. * The mining algorithm is run only once to discover all the rules. An efficient post-analysis system then helps the user to incrementally analyze the discovered rules to identify the interesting ones. The interactive and iterative process is the key to allow the user to supply a more and more complete set of existing knowledge, and to find more interesting rules. * The disadvantage is that if the user knows exactly what he/she wants, producing all the rules will be wasteful. As for which method should be used in an application, it depends on the nature of the application. The first approach is ideal if the user is absolutely sure what types of rules are interesting to him/her (and nothing else). However, if the user does not have any specific rules in mind and wishes to find whatever rules that are interesting, the second approach will be more appropriate. In some applications, an integrated approach may be preferred because there is some knowledge that can be used by the mining algorithm to generate the relevant rules (which can still be many). After that, post-analysis is used to help the user identify the truly interesting ones. We have discussed these issues in the related work section of the paper. Third, I have misgivings about the both-sides unexpected rules case. Won't you find too many rules with both sides unexpected, since these are all the rules that do not match the specification given by the user. So if I have a million rules and a GI that only touches a small fraction, will all the other rules go into the both sides unexpected case? >>> It is true that if the user has only one specific GI, too many rules will appear as both side unexpected rules. The both-side-unexpected ranking is however useful in situations where the user has specified many GIs or RPCs, or the GIs (or RPCs) have covered a big portion of the discovered rules. This ranking serves to remind the user what he/she might have forgotten. It should also be noted that a GI or a RPC can cover a large faction of the rules due to the use of class hierarchy. But of course, we do not claim that this ranking is always very useful. But it can be useful at times. In fact, it is one of our users who suggested this ranking as it helps him to remember some existing knowledge. The time complexity on page 11 is given as O(NUA). Isn't A exponential in the itemset length. In the evaluation section, you only used a small dataset with only about a thousand rules. What about real very large databases? Do you have any results on those. In general one can get millions of association rules. >>> |A| is the number of discovered rules. It is true that the number of rules discovered by an association rule miner can be huge (grow exponentially). That is the problem with the association rule mining model. However, in practice, it is often only used for sparse data (such as supermarket transactions) and with high minimum support and minimum confidence thresholds. So, the number of rules generated is usually manageable. As the reviewer pointed out below on redundant rules, actually most of the discovered association rules are indeed redundant. We have studied the problem of removing redundant rules through pruning in one of our KDD-99 papers. [ZakiKDD00] also discusses the problem. After pruning, the number of rules left is much smaller. We have not had a practical experience with millions of rules. In general, this is possible if the minimum support is set very low. However, rules with too low supports may not be useful in practice as they are usually insignificant. Finally, the paper doesn't mention anything about the redundancy of the unexpected association rules, i.e., is it possible to further prune the set of rules displayed by removing the redundant ones (e.g. Zaki, KDD00). >>> We have considered this issue. The problem of redundant rules is a problem of objective interestingness. Our KDD-99 paper "Pruning and Summarizing the Discovered Associations" addresses this problem. Pruning is used to remove those redundant and/or insignificant rules. We have discussed this issue in the newly revised version, and included the reference of [Zaki, KDD00]. All in all, the paper does reflect an interesting "user-centric" approach to finding unexpected rules (or even conforming ones) from a large set of discovered rules. >>> Thanks.