Opinion Spam Detection: Detecting Fake Reviews and Reviewers

Many names: Spam Review, Fake Review, or Bogus Review
Opinion Spammer, Review Spammer, Fake Reviewer, Shill (Stooge or Plant)
Deception, Deceptive Message
(See this The New York Times front page article, Jan. 26, 2012)
(Bloomberg BusinessWeek, Sept. 29, 2011 and more ... )

New Book: Sentiment Analysis and Opinion Mining (Introduction and Survey), Morgan & Claypool Publishers, May 2012.

Introduction

It has become a common practice for people to find and to read opinions/reviews on the Web for many purposes. For example, if one wants to buy a product, one typically goes to a merchant or review site (e.g., amazon.com) to read some reviews of existing users of the product. If one sees many positive reviews of the product, one is very likely to buy the product. However, if one sees many negative reviews, he/she will most likely choose another product. Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This, unfortunately, gives good incentives for opinion spam.

Opinion Spam: Opinion spamming refers to "illegal" activities (e.g., writing fake reviews, also called shilling) that try to deliberately mislead readers or automated opinion mining and sentiment analysis systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving false negative opinions to some other entities in order to damage their reputation. Opinion spam comes in many forms, e.g., fake reviews (also called bogus reviews), fake comments, fake blogs, fake social network postings, deceptions, and deceptive messages. Manually spotting such postings is very hard, but there are several pages on the Web (see below) which tell people how to spot fake reviews and deceptive messages. To the best of our knowledge, our group was the first in academia to conduct research on detecting fake reviews and reviewers (or shills). Our first paper was published in 2007, and subsequent papers were published in 2008 and 2010. My textbook Web Data Mining also has a section in Chapter 11 discussing the issue (Springer, Second Edition, July 2011; First Edition, Dec, 2006). The objective of our current project is to detect fake reviews. We have not worked on detecting other forms of spam opinions.

Fake Review Detection

We have used supervised, pattern discovery, unexpectedness defined with probability, and graph-based methods for the task. Below are some main signals that we use:
  1. Review content:
    1. Lexical features such as words, n-grams, part-of-speech, and other lexical features.
    2. Content and style similarity of reviews from different reviewers.
    3. Semantic inconsistency (we have never used this kind of features). For example, a reviewer wrote "My wife and I bought this car ..." in one review and then in another review he/she wrote "My husband really love ..." (I heard this example from a friend in a company which actively detects fake reviews).
  2. Reviewer abnormal behaviors:
    1. Public data available from Web sites, e.g., time of posting, frequency of posting, first reviewers of products, and many more. For example, do you see anything wrong with the reviews from this user-name, Big John? What about after you see the reviews of these two user-names, Cletus and Jake? In fact, if you browse the reviews of their reviewed products, you will find another suspicious user-name/person. This is just one example of atypical behaviors that our algorithm is able to discover.
    2. Web site private/internal data (we have not used such data, but they are extremely useful), e.g., IP and MAC addresses, time taking to post a review, physical location of the reviewer, etc (a lot of them).
  3. Product related features: For example, product decriptions and sales ranks
  4. Relationships: Complex relationships among reviewers, reviews, and entities (e.g., products and stores).

We believe that as opinions on the Web are increasingly used in practice by consumers, organizations, and businesses for their decision making, opinion spam will get worse and also more sophisticated. Detecting spam reviews or opinions will become more and more critical. The situation is already quite bad. When I have time, I will write more about it. You can also have a look at our papers.

Differences from Web Spam and Email Spam

Types of Opinion Spam

There are generally three types of spam reviews (Jindal and Liu WSDM-2008):

Type 2 and Type 3 spam are rare, but Type 1 spam reviews are wide-spread and very hard to detect. Some fake reviews are not so harmful, but some are very harmful. See details in (Jindal and Liu WSDM-2008) or Chapter 11 of my book Web Data Mining.

Acknowledgement: This project was partially funded by Microsoft and Google

Some Fake Review Cases in the News

Professional Fake Review Writing Services

How to Spot Fake Reviews Manually

Manipulating Social Media (sock puppets - fake identities - fake personas)

China's Internet "Water Army" (Shuijun) - Opinion Spammers

Data Sets

Publications

  1. Arjun Mukherjee, Bing Liu, and Natalie Glance. Spotting Fake Reviewer Groups in Consumer Reviews. International World Wide Web Conference (WWW-2012), Lyon, France, April 16-20, 2012. (See media coverage of this work from April 16, 2012)

  2. Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. Identify Online Store Review Spammers via Social Review Graph. ACM Transactions on Intelligent Systems and Technology, accepted for publication, 2011.

  3. Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM-2011, 2011.

  4. Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, Nitin Jindal. Detecting Group Review Spam. WWW-2011 poster paper, 2011.

  5. Nitin Jindal, Bing Liu and Ee-Peng Lim. "Finding Unusual Review Patterns Using Unexpected Rules" Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, short paper), Toronto, Canada, Oct 26 - 30, 2010.

  6. Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady Lauw. "Detecting Product Review Spammers using Rating Behaviors." Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, full paper), Toronto, Canada, Oct 26 - 30, 2010.

  7. Nitin Jindal and Bing Liu. "Opinion Spam and Analysis." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.

  8. Nitin Jindal and Bing Liu. "Review Spam Detection." Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada.

Created by Bing Liu, 2008.