Opinion Mining, Extraction, Summarizationn and Spam Detection

FBS: Feature Based Opinion Mining and Summarization
(With Data Sets below)

Detecting Untruthful and Fake Reviews


New Book: Web Data Mining - Exploring Hyperlinks, Contents and Usage Data, (one chapter on sentiment analysis).

See "Feature-Based Opinion Mining" in action in the new Microsoft Live Search (9/28/2007)

Co-Founded a Startup Company on opinion discovery

Some Talks and Tutorials on the Topic

A Tutorial on sentiment analysis, opinion mining and sentiment classification: based on Chapter 11 of the above book.

1. Introduction

This work is in the general area of opinion extraction or opinion mining, and feature-based opinion summarization from the user-generated content or user-generated media on the Web, e.g., reviews, forum and group discussions, and blogs. The area is also known as sentiment analysis, and is closely related to sentiment classification. Our current work is in two main areas, which reflect two kinds of opinions (or evaluations)

Recently, we also started to work on review and opinion spam analysis and detection, i.e., detecting untruthful or fake reviews. See the papers [Jinal and Liu, WWW-2007, and WSDM-2008]

Acknowledgement: This project is partially funded by Microsoft Corporation.

2. Opinion extraction or mining from customer reviews

It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make a decision whether to buy the product. It also makes it difficult for the manufacturer of the product to keep track and manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some sentences of the original sentences from the reviews to capture the main points as in the classic text summarization. For researchers, we always want to have an abstraction of the problem. Here it is.

Abstraction of the problem: Feature-based opinion summary of multiple reviews (KDD-04 and WWW-05)
Formal definitions can be found in my book "Web Data Mining". They are based on several of our papers in 2004 and 2005. The abstraction provides a model of reviews (or online opinions), describes what should be extracted from opinion sources (e.g., consumer reviews, forums, and blogs) and how the results may be organized and presented to the user. The main mining tasks are:

We have proposed several techniques to perform these tasks.

3. Comparative sentence and relation extraction

A comparative sentence usually expresses an ordering relation between two sets of entities with respect to some common features. For example, the comparative sentence "Canon's optics are better than those of Sony and Nikon" expresses the comparative relation: (better, {optics}, {Canon}, {Sony, Nikon}). Comparative sentences use different language constructs from typical opinion sentences (e.g., "Cannon's optic is great").

Abstraction of the problem: Extraction of comparative relations, i.e., "who is better than who on what". Again, the formal definitions can be found in my book "Web Data Mining". The main mining tasks are:
This problem has many applications. For example, a product manufacturer may want to know customer opinions of its products in comparison with those of its competitors.

Data Sets

Working Papers

  1. Bing Liu. "Opinion Mining." Invited contribution to Encyclopedia of Database Systems, to complete in July, 2008.
  2. Bing Liu. "Sentiment Analysis and Subjectivity." Invited contribution to Handbook of Natural Language Processing, Second Edition, to complete later this year, 2008.

Publications

  1. Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences" To appear in Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008. [Ready Soon]

  2. Xiaowen Ding, Bing Liu and Philip S. Yu. "A Holistic Lexicon-Based Appraoch to Opinion Mining." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA. [Ready Soon]

  3. Nitin Jindal and Bing Liu. "Opinion Spam and Analysis." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA. [Ready Soon]

  4. Nitin Jindal and Bing Liu. "Review Spam Detection." In Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada. [PDF]

  5. Xiaowen Ding and Bing Liu. "The Utility of Linguistic Rules in Opinion Mining." SIGIR-2007 (poster paper), 23-27 July 2007, Amsterdam. [PDF]

  6. Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents" Proceedings of the 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR-06), Seattle 2006. [PDF]

  7. Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations." Proceedings of 21st National Conference on Artificial Intellgience (AAAI-2006), July 16.20, 2006, Boston, Massachusetts, USA. [PDF]

  8. Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web" Proceedings of the 14th international World Wide Web conference (WWW-2005), May 10-14, 2005, in Chiba, Japan. [PDF]

  9. Minqing Hu and Bing Liu. "Mining and summarizing customer reviews". Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25, 2004. [PDF]

  10. Minqing Hu and Bing Liu. "Mining Opinion Features in Customer Reviews." Proceedings of Nineteeth National Conference on Artificial Intellgience (AAAI-2004), San Jose, USA, July 2004. [PDF]

Created on May 15, 2004 by Bing Liu; and Minqing Hu.