Opinion Spam Detection: Detecting Fake Reviews and Reviewers

Many names: Spam Review, Fake Review, or Bogus Review
Opinion Spammer, Review Spammer, or Fake Reviewer
Deception, Deceptive Message

Introduction

It has become a common practice for people to find and to read opinions/reviews on the Web for many purposes. For example, if one wants to buy a product, one typically goes to a merchant or review site (e.g., amazon.com) to read some reviews of existing users of the product. If one sees many positive reviews of the product, one is very likely to buy the product. However, if one sees many negative reviews, he/she will most likely choose another product. Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This, unfortunately, gives good incentives for opinion spam.

Opinion Spam: Opinion spamming refers to "illegal" activities (e.g., writing fake reviews) that try to deliberately mislead readers or automated opinion mining and sentiment analysis systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving false negative opinions to some other entities in order to damage their reputation. Opinion spam comes in many forms, e.g., fake reviews (also called bogus reviews), fake comments, fake blogs, fake social network postings, deceptions, and deceptive messages. Manually spotting such postings is very hard, but there are several pages on the Web (see below) which tell people how to spot fake reviews and deceptive messages. To the best of our knowledge, our group was the first in academia to conduct the research to detect fake reviews and reviewers. Our first paper was published in 2007, and subsequent papers were published in 2008 and 2010. My textbook Web Data Mining also has a section in Chapter 11 discussing the issue (Springer, Second Edition, July 2011; First Edition, Dec, 2006). The objective of our current project is to detect fake reviews. We have not worked on detecting other forms of spam opinions.

Fake review detection: We have used both supervised and unsupervised methods for the task. Three main types of features or signals are:

Review contents: Words and other linguistic features
Reviewer abnormal behaviors: Do you see anything wrong with the reviews of this person: Big John? What about after seeing the reviews of these two persons: Cletus and Jake? This is just one example of atypical behaviors that our algorithm is able to discover.
Product features: For example, product decriptions and sales ranks

We can safely predict that as opinions on the Web are increasingly used in practice by consumers, organizations, and businesses for their decision making, opinion spam will get worse and also more sophisticated. Detecting spam reviews or opinions will become more and more critical. The situation is already quite bad. When I have time, I will write more about it. You can also have a look at our papers.

Acknowledgement: This project has been funded by Microsoft and Google

Some Fake Review Cases in the News

(Must Read) Amazon Glitch Unmasks War Of Reviewers, February 14, 2004 [The New York Times].
Charges Settled Over Fake Reviews on iTunes, August 26, 2010 [The New York Times].
Company Settles Case of Reviews It Faked, July 14, 2009 [The New York Times].
TripAdvisor warns consumers about fake reviews, July 16, 2009 [USA Today].
Belkin's Development Rep is Hiring People to Write Fake Positive Amazon Reviews, January 16, 2009 [The Daily Background].
A Fake Amazon Reviewer Confesses, July 9, 2009 [The Wall Street Journal].

Professional Fake Review Writing Services

How to Spot Fake Reviews Manually

Manipulating Social Media (sock puppets - fake identities - fake personas)

Revealed: US spy operation that manipulates social media, Guardian.co.uk, Thursday 17 March 2011.
America's absurd stab at systematising sock puppetry, Guardian.co.uk, Thursday 17 March 2011.

China's Internet "Water Army" (Shuijun) - Opinion Spammers

You can hire people to write and post fake reviews or comments, and even bribe staff at review, forum and microblog sites to delete posts that you do not like.
'Water Army' Whistleblower Threatened, January 7, 2011, People's Daily.
The Chinese Online "Water Army", June 25, 2010, Wired.com.
If you read Chinese, see this description from Baidu Baike at baidu.com.

Differences from Web Spam and Email Spam

Web spam: Web spamming refers to the use of "illegitimate means" to boost the search rank position of some target Web pages (see this New York Times article). There are two main types of spam, link spam and content spam. Opinion spam is very different from Web spam because both link spam and content spam seldom occur in opinion documents such as product reviews. Link spam is spam on hyperlinks, which almost does not exist in reviews as there are usually no links among reviews. Content spam tries to add irrelevant or remotely relevant words in target Web pages in order to fool search engines, which again hardly occurs in reviews.
Email Spam: Email spamming usually refers to unsolicited commercial advertisements. Although exists, advertisements in reviews are rare. They are also relatively easy to detect.

Types of Opinion Spam

There are generally three types of spam reviews (Jindal and Liu WSDM-2008):

Type 1 (fake reviews): These are reviews that deliberately mislead readers or opinion mining systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving unjust or malicious negative opinions to some other entities in order to damage their reputation.
Type 2 (reviews on brands only): These reviews do not comment on the specific products that they are supposed to review, but only comment on the brands, the manufacturers or the sellers of the products. Although they may be useful, they are considered as spam because they are not targeted at the specific products and are often biased. For example, in a review for a HP printer, the reviewer only wrote "I hate HP. I never buy any of their products".
Type 3 (non-reviews): These are not reviews or opinionated although they appear as reviews. There are two main sub-types: (1) advertisements, and (2) other irrelevant texts containing no opinions (e.g., questions, answers, and random texts).

Type 2 and Type 3 spam are rare, but Type 1 spam reviews are wide-spread and very hard to detect. Some fake reviews are not so harmful, but some are very harmful. See details in (Jindal and Liu WSDM-2008) or Chapter 11 of my book Web Data Mining.

Data Sets

Amazon Product Review Data (Huge) used in (Jindal and Liu, WWW-2007, WSDM-2008; Lim et al, CIKM-2010; Jindal, Liu and Lim, CIKM-2010, and Mukherjee et al. WWW-2011) for review spam (fake review) detection. It has information about reviewers, review text, ratings, product info, etc. Due to the large file size, you may need to use Download Accelerator Plus (DAP) to download. If you use this data, please cite (Jindal and Liu WSDM-2008).

Publications

Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, Nitin Jindal. Detecting Group Review Spam. WWW-2011 poster paper, 2011.
Nitin Jindal, Bing Liu and Ee-Peng Lim. "Finding Unusual Review Patterns Using Unexpected Rules" Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, short paper), Toronto, Canada, Oct 26 - 30, 2010.
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady Lauw. "Detecting Product Review Spammers using Rating Behaviors." Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM-2010, full paper), Toronto, Canada, Oct 26 - 30, 2010.
Nitin Jindal and Bing Liu. "Opinion Spam and Analysis." Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.
Nitin Jindal and Bing Liu. "Review Spam Detection." Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada.

Created by Bing Liu, 2008.