Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
Feature-Based Opinion Mining and Summarization
(or Aspect-Based Sentiment Analysis and Summarization)
Product Review Datasets and Opinion Lexicon
Detecting Fake Reviews (Media coverage: The New York Times, BusinessWeek and more ... )
Note: I don't know the techniques used by Microsoft Live/Bing (9/28/2007), but Google has a paper. To see the model, please check out (Hu and Liu, KDD-2004) and (Liu et al, WWW-2005) below, or the book above (better). Try search for a camera and click on reviews. You will see summarized user opinions on product features/aspects in a bar chart.
The organiser asked me to publicise: $100 registration discount (code FOAF100).
50% government/academic discount (code GOVACAD) and a special student rate
of $50 tutorial, $200 symposium (code STUDENT).
Opinion Mining: Abstraction and Techniques. Invited talk given in the expert speaker series of the Data Science Summer Institute at UIUC, June 17, 2011. The talk has also been given in a number of universities and research labs in the summer of 2011.
Opinion Mining: What is there for Data Miners? Keynote at Interantional Wordshop on Topic Feature Discovery and Opinion Mining, Sydney, Australia, Dec 14, 2010 (at ICDM-2010).
Beyond Sentiments - Opinion Mining in the Read World. Keynote at The 2nd International Workshop on Search and Mining User-generated Contents, Toronto, Oct 30, 2010 (at CIKM-2010).
Opinion Mining: Structure the Unstructured. Keynote at The First International Workshop on Opinion Mining for Business Intelligence (OMBI'10), Toronto, Aug 31, 2010.
Sentiment Analysis: A multifaceted Problem. Invited talk given at the Institute of Automation, Chinese Academy of Sciences, May 11, 2010.
Opinion Mining and Sumamrization. Invited talk given at the MAVIR symposium, Nov 19, 2009, Madrid, Spain.
Opinion Mining and Sentiment Analysis via Divide and Conquer. Invited talk given at the Workshop on Mining User-Generated Content, Aug 8, 2009, Singapore Management University, Singapore.
Opinion Mining and Summarization. Panel talk at Social Media Research Summit @ the 2008 Microsoft Research Faculty Summit, July 30, 2008, Seattle, USA.
Opinion Mining and Search, Invited talk at Google, Pittsburgh,
Sept 29, 2006. (Similar talks were also given at Yahoo!, Microsoft, Motorola Labs, i2r, University of Iowa, Tsinghua University, etc).
Acknowledgement: This project is partially funded by National Science Foundation (NSF), Microsoft Corporation, and Google.
1. Introduction
This work is in the general area of sentiment analysis, opinion
extraction or opinion mining, and feature-based opinion summarization
from the user-generated content or user-generated media on the Web, e.g.,
reviews, forum and group discussions, and blogs. In our KDD-2004 paper,
we proposed the Feature-Based Opinion Mining model, which is
now also called Aspect-Based Opinion Mining (as the term feature
here can confuse with the term feature used in machine learning). The
output of such opinion mining is a feature-based opinion summary
or aspect-based opinion summary. The area is also related to
sentiment classification. Our current work is in two main areas,
which reflect two kinds of opinions (or evaluations)
Mining regular (or direct) opinions. Ex: (1). This camera is great. (2). After taking the drug, I got stomach pain.
Mining comparative opinions. Ex: Coke tastes better than Pepsi.
Recently, we also started to work on
Review and opinion spam detection, i.e., to detect fake reviews (also called bogus reviews or fraudulent reviews) . See the papers [WWW-2007, WSDM-2008, CIKM-2010a, CIKM-2010b]
Note: Feature-based summarization (or aspect-based summization) turns out to be also a powerful model for normal text summarization (producing a short text from a long text) on a given topic because it is obvious that the summarization should include different aspects of the topic.
2. Opinion mining or extraction from customer reviews
It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make a decision whether to buy the product. It also makes it difficult for the manufacturer of the product to keep track and manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features or aspects of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some sentences of the original sentences from the reviews to capture the main points as in the classic text summarization. For researchers, we always want to have an abstraction of the problem. Here it is.
Abstraction of the problem: Feature-based opinion summary (aspect-based opinion summary) of multiple reviews (KDD-04 and WWW-05)
Formal definitions can be found in my book "Web Data Mining". They are based on several of our papers in 2004 and 2005. The abstraction provides a model of reviews
(or online opinions), describes what should be extracted from opinion
sources (e.g., consumer reviews, forums, and blogs) and how the results
may be organized and presented to the user. The main mining tasks are:
mining product features (or aspects) that have been commented on by customers,
determining whether the comment/opinion on each product feature (or aspect) in each sentence is positive, negative or neutral (sentiment analysis), and
summarizing the results.
We have proposed several techniques to perform these tasks.
3. Comparative sentence and relation mining
A comparative sentence usually expresses an ordering relation between two sets
of entities with respect to some shared features (or aspects). For example, the
comparative sentence "Canon's optics are better than those of Sony and
Nikon" expresses the comparative relation: (better, {optics}, {Canon},
{Sony, Nikon}). Comparative sentences use different language constructs
from typical opinion sentences (e.g., "Cannon's optic is great").
Abstraction of the problem: Extraction of comparative relations, i.e.,
"who is better than who on what".
Again, the formal definitions can be found in my book "Web Data Mining". The main mining tasks are:
identify comparative sentences from texts, e.g., reviews, forum or blog postings, and news articles.
extract comparative relations from the identified comparative sentences.
This problem has many applications. For example, a product manufacturer may want to know customer opinions of its products in comparison with those of its competitors.
Although necessary, having an opinion lexicon is far from sufficient for accurate sentiment analysis. See this paper: Sentiment Analysis and Subjectivity.
Amazon Product Review Data (more than 5.8 million reviews) used in (Jindal and Liu, WWW-2007, WSDM-2008; Lim et al, CIKM-2010; Jindal, Liu and Lim, CIKM-2010; Mukherjee et al. WWW-2011; Mukherjee, Liu and Glance, WWW-2012) for opinion spam (fake review) detection. You can also use it for sentiment analysis. It has information about reviewers, review texts, ratings,
product info, etc. Due to the large file size, you may need to use Download Accelerator Plus (DAP) to download. If you use this data, please cite (Jindal and Liu, WSDM-2008).
Arjun Mukherjee and Bing Liu. Modeling Review Comments. to appear in Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (ACL-2012), July 8-14, 2012, Jeju, Republic of Korea.
Arjun Mukherjee and Bing Liu. Aspect Extraction through Semi-Supervised Modeling. to appear in Proceedings of 50th Anunal Meeting of Association for Computational Linguistics (ACL-2012), July 8-14, 2012, Jeju, Republic of Korea.
Lei Zhang and Bing Liu. "Extracting Resource Terms for Sentiment Analysis," to appear in the 5th International Joint Conference on Natural Language Processing (ICNLP-2011), November 8-13, 2011, Chiang Mai, Thailand.
Zhongwu Zhai, Bing Liu, Hua Xu and Peifa Jia. "Clustering Product Features for Opinion Mining."Proceedings of Fourth ACM
International Conference on Web Search and Data Mining (WSDM-2011),
Feb. 9-12, 2011, Hong Kong, China.
Arjun Mukherjee and Bing Liu. "Improving Gender Classification
of Blog Authors."Proceedings of Conference on Empirical
Methods in Natural Language Processing (EMNLP-10). Oct. 9-11, 2010, MIT,
Massachusetts, USA.
Ramanathan Narayanan, Bing Liu and Alok Choudhary. "Sentiment Analysis of Conditional Sentences."Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-09). August 6-7, 2009. Singapore.
Bing Liu. "Opinion Mining." Invited contribution to Encyclopedia of Database Systems, 2008.
Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences."Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008.
Xiaowen Ding, Bing Liu and Philip S. Yu. "A Holistic Lexicon-Based Appraoch to Opinion Mining."Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.
Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents"Proceedings of the 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR-06), Seattle 2006.
Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations."Proceedings of 21st National Conference on Artificial Intellgience (AAAI-2006), July 16.20, 2006, Boston, Massachusetts, USA.
Minqing Hu and Bing Liu. "Mining and summarizing customer reviews."Proceedings of the ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle,
Washington, USA, Aug 22-25, 2004.
Arjun Mukherjee, Bing Liu, and Natalie Glance.
Spotting Fake Reviewer Groups in Consumer Reviews. International
World Wide Web Conference (WWW-2012),
accepted for publication, Lyon, France, April 16-20, 2012. (See media coverage of this work from April 16, 2012)
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady Lauw.
"Detecting Product Review Spammers using Rating Behaviors." to appear in
The 19th ACM International Conference on Information and Knowledge
Management (CIKM-2010, full paper), Toronto, Canada, Oct 26 - 30, 2010.
Nitin Jindal, Bing Liu and Ee-Peng Lim. "Finding Unusual Review
Patterns Using Unexpected Rules" to appear in The 19th ACM
International Conference on Information and Knowledge Management
(CIKM-2010, short paper), Toronto, Canada, Oct 26 - 30, 2010.
Nitin Jindal and Bing Liu. "Opinion Spam and Analysis."Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.
Nitin Jindal and Bing Liu. "Review Spam Detection." Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada.
Created on May 15, 2004 by Bing Liu; and Minqing Hu.