Opinion Mining, Sentiment Analysis, and Opinion Spam Detection

Feature-Based Opinion Mining and Summarization
(or Aspect-Based Sentiment Analysis and Summarization)
Product Review Datasets and Opinion Lexicon
Detecting Fake Reviews
(Media coverage: The New York Times, BusinessWeek and more ... )


Textbook: Web Data Mining - Exploring Hyperlinks, Contents and Usage Data, Ch. 11: Opinion Mining, Second Edition, July 2011.

See "Feature-Based Opinion Mining and Summmarization" in Microsoft Live/Bing Search and Google Product Search (paper).

NLP Handbook Chapter: Sentiment Analysis and Subjectivity, 2nd Edition, (Editors) N. Indurkhya and F. J. Damerau), 2010.

Opinion Parser: A new industry strength sentiment analysis system (older systems: FBS and Opinion Observer). It has been licensed to a company.

New Tutorial: Sentiment Analysis Tutorial - (references), given at AAAI-2011, August 8, 2011, San Francisco, USA.

Some Invited Talks, Tutorials and Interviews on the Topic

Acknowledgement: This project is partially funded by National Science Foundation (NSF), Microsoft Corporation, and Google.

1. Introduction

This work is in the general area of sentiment analysis, opinion extraction or opinion mining, and feature-based opinion summarization from the user-generated content or user-generated media on the Web, e.g., reviews, forum and group discussions, and blogs. In our KDD-2004 paper, we proposed the Feature-Based Opinion Mining model, which is now also called Aspect-Based Opinion Mining (as the term feature here can confuse with the term feature used in machine learning). The output of such opinion mining is a feature-based opinion summary or aspect-based opinion summary. The area is also related to sentiment classification. Our current work is in two main areas, which reflect two kinds of opinions (or evaluations)

Recently, we also started to work on

Note: Feature-based summarization (or aspect-based summization) turns out to be also a powerful model for normal text summarization (producing a short text from a long text) on a given topic because it is obvious that the summarization should include different aspects of the topic.

2. Opinion mining or extraction from customer reviews

It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make a decision whether to buy the product. It also makes it difficult for the manufacturer of the product to keep track and manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features or aspects of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some sentences of the original sentences from the reviews to capture the main points as in the classic text summarization. For researchers, we always want to have an abstraction of the problem. Here it is.

Abstraction of the problem: Feature-based opinion summary (aspect-based opinion summary) of multiple reviews (KDD-04 and WWW-05)
Formal definitions can be found in my book "Web Data Mining". They are based on several of our papers in 2004 and 2005. The abstraction provides a model of reviews (or online opinions), describes what should be extracted from opinion sources (e.g., consumer reviews, forums, and blogs) and how the results may be organized and presented to the user. The main mining tasks are:

We have proposed several techniques to perform these tasks.

3. Comparative sentence and relation mining

A comparative sentence usually expresses an ordering relation between two sets of entities with respect to some shared features (or aspects). For example, the comparative sentence "Canon's optics are better than those of Sony and Nikon" expresses the comparative relation: (better, {optics}, {Canon}, {Sony, Nikon}). Comparative sentences use different language constructs from typical opinion sentences (e.g., "Cannon's optic is great").

Abstraction of the problem: Extraction of comparative relations, i.e., "who is better than who on what". Again, the formal definitions can be found in my book "Web Data Mining". The main mining tasks are:
This problem has many applications. For example, a product manufacturer may want to know customer opinions of its products in comparison with those of its competitors.

Opinion Lexicon (or Sentiment Lexicon)

Data Sets


Publications - (sentiment analysis)                Publications - (opinion spam or fake review detection)

  1. Arjun Mukherjee and Bing Liu. Modeling Review Comments. to appear in Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (ACL-2012), July 8-14, 2012, Jeju, Republic of Korea.

  2. Arjun Mukherjee and Bing Liu. Aspect Extraction through Semi-Supervised Modeling. to appear in Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (ACL-2012), July 8-14, 2012, Jeju, Republic of Korea.

  3. Lei Zhang and Bing Liu. "Extracting Resource Terms for Sentiment Analysis," to appear in the 5th International Joint Conference on Natural Language Processing (ICNLP-2011), November 8-13, 2011, Chiang Mai, Thailand.

  4. Zhongwu Zhai, Bing Liu, Lei Zhang, Hua Xu, Peifa Jia. Identifying Evaluative Opinions in Online Discussions. Proceedings of AAAI-2011, San Francisco, USA, August 7-11, 2011.

  5. Lei Zhang and Bing Liu. "Identifying Noun Product Features that Imply Opinions." ACL-2011 (short paper), Portland, Oregon, USA, June 19-24, 2011.

  6. Guang Qiu, Bing Liu, Jiajun Bu and Chun Chen. "Opinion Word Expansion and Target Extraction through Double Propagation." Computational Linguistics, March 2011, Vol. 37, No. 1: 9.27.

  7. Zhongwu Zhai, Bing Liu, Hua Xu, Peifa Jia. "Constrained LDA for Grouping Product Features in Opinion Mining." Proceedings of PAKDD-2011, Shenzhen, China, 2011. (Best Paper Award)

  8. Zhongwu Zhai, Bing Liu, Hua Xu and Peifa Jia. "Clustering Product Features for Opinion Mining." Proceedings of Fourth ACM International Conference on Web Search and Data Mining (WSDM-2011), Feb. 9-12, 2011, Hong Kong, China.

  9. Arjun Mukherjee and Bing Liu. "Improving Gender Classification of Blog Authors." Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-10). Oct. 9-11, 2010, MIT, Massachusetts, USA.

  10. Xiaowen Ding and Bing Liu. "Resolving Object and Attribute Coreference in Opinion Mining." Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), August 23-27, Beijing, China.

  11. Zhongwu Zhai, Bing Liu, Hua Xu and Peifa Jia. "Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints" Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), August 23-27, Beijing, China.

  12. Lei Zhang and Bing Liu. "Extracting and Ranking Product Features in Opinion Documents." Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), August 23-27, Beijing, China.

  13. Bing Liu. "Sentiment Analysis: A Multifaceted Problem." Invited paper, IEEE Intelligent Systems, 25(3), 2010, pp. 76-80.

  14. Bing Liu. "Sentiment Analysis and Subjectivity." Invited Chapter for the Handbook of Natural Language Processing, Second Edition. March, 2010.

  15. Ramanathan Narayanan, Bing Liu and Alok Choudhary. "Sentiment Analysis of Conditional Sentences." Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-09). August 6-7, 2009. Singapore.

  16. Guang Qiu, Bing Liu, Jiajun Bu and Chun Chen. "Expanding Domain Sentiment Lexicon through Double Propagation." Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09), Pasadena, California, USA, July 11-17, 2009.

  17. Xiaowen Ding, Bing Liu and Lei Zhang. "Entity Discovery and Assignment for Opinion Mining Applications," Proceedings of ACM SIGKDD Interntaional Conference on Knowledge Disocvery and Data Mining (KDD-09, industrial track), June 28-July 1, 2009, Paris.

  18. Bing Liu. "Opinion Mining." Invited contribution to Encyclopedia of Database Systems, 2008.

  19. Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences." Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008.

  20. Xiaowen Ding, Bing Liu and Philip S. Yu. "A Holistic Lexicon-Based Appraoch to Opinion Mining." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.

  21. Xiaowen Ding and Bing Liu. "The Utility of Linguistic Rules in Opinion Mining." SIGIR-2007 (poster paper), 23-27 July 2007, Amsterdam.

  22. Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents" Proceedings of the 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR-06), Seattle 2006.

  23. Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations." Proceedings of 21st National Conference on Artificial Intellgience (AAAI-2006), July 16.20, 2006, Boston, Massachusetts, USA.

  24. Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web"Proceedings of the 14th international World Wide Web conference (WWW-2005), May 10-14, 2005, in Chiba, Japan.

  25. Minqing Hu and Bing Liu. "Mining Opinion Features in Customer Reviews." Proceedings of Nineteeth National Conference on Artificial Intellgience (AAAI-2004), San Jose, USA, July 2004.

  26. Minqing Hu and Bing Liu. "Mining and summarizing customer reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25, 2004.

Publications - (Opinion spam or fake review detection)           (Check out my Opinion Spam Detection project homepage)

  1. Arjun Mukherjee, Bing Liu, and Natalie Glance. Spotting Fake Reviewer Groups in Consumer Reviews. International World Wide Web Conference (WWW-2012), accepted for publication, Lyon, France, April 16-20, 2012. (See media coverage of this work from April 16, 2012)

  2. Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. Identify Online Store Review Spammers via Social Review Graph. ACM Transactions on Intelligent Systems and Technology, accepted for publication, 2011.

  3. Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM-2011, 2011.

  4. Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, Nitin Jindal. Detecting Group Review Spam. WWW-2011 poster paper, 2011.

  5. Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady Lauw. "Detecting Product Review Spammers using Rating Behaviors." to appear in The 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, full paper), Toronto, Canada, Oct 26 - 30, 2010.

  6. Nitin Jindal, Bing Liu and Ee-Peng Lim. "Finding Unusual Review Patterns Using Unexpected Rules" to appear in The 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, short paper), Toronto, Canada, Oct 26 - 30, 2010.

  7. Nitin Jindal and Bing Liu. "Opinion Spam and Analysis." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.

  8. Nitin Jindal and Bing Liu. "Review Spam Detection." Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada.

Created on May 15, 2004 by Bing Liu; and Minqing Hu.