Opinion Spam Detection: Detecting Fake Reviews and Reviewers
Many names: Spam Review, Fake Review, or Bogus Review
Opinion Spammer, Review Spammer, or Fake Reviewer
Deception, Deceptive Message
Introduction
It has become a common practice for people to find and to read opinions/reviews on the Web for many purposes. For example, if one wants to buy a product, one typically goes to a merchant or review site (e.g., amazon.com) to read some reviews of existing users of the product. If one sees many positive reviews of the product, one is very likely to buy the product. However, if one sees many negative reviews, he/she will most likely choose another product. Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This, unfortunately, gives good incentives for opinion spam.
Opinion Spam: Opinion spamming refers to "illegal" activities (e.g., writing fake reviews) that try to deliberately mislead readers or automated opinion mining and sentiment analysis systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving false
negative opinions to some other entities in order to damage their reputation.
Opinion spam comes in many forms, e.g., fake reviews (also called bogus reviews), fake comments, fake blogs, fake social network postings, deceptions, and deceptive messages. Manually spotting such postings is very hard, but there are several pages on the Web (see below) which tell people how to spot fake reviews and deceptive messages. To the best of our knowledge, our
group was the first in academia to conduct the research to detect fake
reviews and reviewers. Our first paper was published in 2007, and subsequent papers were published in 2008 and 2010. My textbook Web Data Mining also has a section in Chapter 11 discussing the issue (Springer, Second Edition, July 2011; First Edition, Dec, 2006). The objective of our current project is to detect fake reviews. We have not worked on detecting other forms of spam opinions.
Fake review detection: We have used both supervised and unsupervised methods for the task. Three main types of features or signals are:
Review contents: Words and other linguistic features
Reviewer abnormal behaviors: Do you see anything wrong with the reviews of
this person: Big John? What about after seeing the reviews of these two persons: Cletus and Jake? This is just one example of atypical behaviors that our algorithm is able to discover.
Product features: For example, product decriptions and sales ranks
We can safely predict that as opinions on the Web are increasingly used in practice by consumers, organizations, and businesses for their decision making, opinion spam will get worse and also more sophisticated.
Detecting spam reviews or opinions will become more and more critical. The situation is already quite bad. When I have time, I will write more about it. You can also have a look at our papers.
Acknowledgement: This project has been funded by Microsoft and Google
China's Internet "Water Army" (Shuijun) - Opinion Spammers
You can hire people to write and post fake reviews or comments, and even bribe staff at review, forum and microblog sites to delete posts that you do not like.
If you read Chinese, see this description from Baidu Baike at baidu.com.
Differences from Web Spam and Email Spam
Web spam: Web spamming refers to the use of "illegitimate means" to boost the search rank position of some target Web pages (see this New York Times article). There are two main types of spam, link spam and content spam. Opinion spam is very different from Web spam because both link spam and content spam seldom occur in opinion documents such as product reviews. Link spam is spam on hyperlinks, which almost does not exist in reviews as there are usually no links among reviews. Content spam tries to add irrelevant or remotely relevant words in target Web pages in order to fool search engines, which again hardly occurs in reviews.
Email Spam: Email spamming usually refers to unsolicited commercial advertisements. Although exists, advertisements in reviews are rare. They are also relatively easy to detect.
Types of Opinion Spam
There are generally three types of spam reviews (Jindal and Liu WSDM-2008):
Type 1 (fake reviews): These are reviews that deliberately mislead readers or opinion mining systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving unjust or malicious negative opinions to some other entities in order to damage their reputation.
Type 2 (reviews on brands only): These reviews do not comment on the specific products that they are supposed to review, but only comment on the brands, the manufacturers or the sellers of the products. Although they may be useful, they are considered as spam because they are not targeted at the specific products and are often biased. For example, in a review for a HP printer, the reviewer only wrote "I hate HP. I never buy any of their products".
Type 3 (non-reviews): These are not reviews or opinionated although they appear as reviews. There are two main sub-types: (1) advertisements, and (2) other irrelevant texts containing no opinions (e.g., questions, answers, and random texts).
Type 2 and Type 3 spam are rare, but Type 1 spam reviews are wide-spread and very hard to detect. Some fake reviews are not so harmful, but some are very harmful. See details in (Jindal and Liu WSDM-2008) or Chapter 11 of my book Web Data Mining.
Data Sets
Amazon Product Review Data (Huge) used in (Jindal and Liu, WWW-2007, WSDM-2008; Lim et al, CIKM-2010; Jindal, Liu and Lim, CIKM-2010, and Mukherjee et al. WWW-2011) for review spam (fake review)
detection. It has information about reviewers, review text, ratings,
product info, etc. Due to the large file size, you may need to use Download Accelerator Plus (DAP) to download. If you use this data, please cite (Jindal and Liu WSDM-2008).
Nitin Jindal, Bing Liu and Ee-Peng Lim. "Finding Unusual Review
Patterns Using Unexpected Rules"Proceedings of the 19th ACM
International Conference on Information and Knowledge Management
(CIKM-2010, short paper), Toronto, Canada, Oct 26 - 30, 2010.
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu and Hady Lauw.
"Detecting Product Review Spammers using Rating Behaviors."Proceedings of the 19th ACM International Conference on Information and Knowledge
Management (CIKM-2010, full paper), Toronto, Canada, Oct 26 - 30, 2010.
Nitin Jindal and Bing Liu. "Opinion Spam and Analysis."Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.
Nitin Jindal and Bing Liu. "Review Spam Detection." Proceedings of WWW-2007 (poster paper), May 8-12, Banff, Canada.