Opinion Spam Detection: Detecting Fake Reviews and Reviewers
Many names: Spam Review, Fake Review, Bogus Review, Deceptive review
Opinion Spammer, Review Spammer, Fake Reviewer, Shill (Stooge or Plant),
(See this The New York Times front page article, Jan. 26, 2012)
(Bloomberg BusinessWeek, Sept. 29, 2011 and more ... )
It has become a common practice for people to read online opinions/reviews for different purposes. For example, if one wants to buy a product, one typically goes to a review site (e.g., amazon.com) to read some reviews of the product. If most reviews are positive, one is likely to buy the product. If most reviews
are negative, one will almost certainly not buy it.
Positive opinions can result in significant financial gains and/or fames for
busineses, organizations and individuals. This, unfortunately, gives strong incentives for opinion spamming.
Opinion Spamming: It refers to "illegal" activities (e.g., writing fake reviews, also called shilling) that try to mislead readers or automated opinion mining and sentiment analysis systems by giving undeserving positive opinions to some target entities in order to promote the entities and/or by giving false
negative opinions to some other entities in order to damage their reputations.
Opinion spam has many forms, e.g., fake reviews (also called bogus reviews), fake comments, fake blogs, fake social network postings, deceptions, and deceptive messages.
We believe that as opinions on the Web are increasingly used in practice by consumers, organizations, and businesses for their decision making, opinion spamming will get worse and also more sophisticated.
Detecting spam reviews or opinions will become more and more critical. The situation is already quite bad.
To the best of our knowledge, my group is the first to conduct
research on detecting fake
reviews and reviewers (or shills). Our first paper was published in 2007, and subsequent papers were published in 2008, 2010, and 2012. Both my books Web Data Mining and Sentiment Analysis and Opinion Mining discuss the issue.
NOTE: This is closely related to Astroturfing: "Astroturfing refers to political, advertising, or public relations campaigns that are designed to mask the sponsors of the message to give the appearance of coming from a disinterested, grassroots participant. Astroturfing is intended to give the statements the credibility of an independent entity by withholding information about the source's financial connection. The term is a derivation of AstroTurf, a brand of synthetic carpeting designed to look like natural grass." Quoted from the Wikipedia page.
Acknowledgement: This project has been partially funded by National Science Foundation, Microsoft, and Google
Fake Review Detection
We have used supervised learning, pattern discovery, graph-based methods, and relational modeling to solve the problem. Below are some main signals that we have used:
Lexical features such as word n-grams, part-of-speech n-grams, and other lexical attributes.
Content and style similarity of reviews from different reviewers.
Semantic inconsistency (we have never used this kind of features). For example, a reviewer wrote "My wife and I bought this car ..." in one review and then in another review he/she wrote "My husband really love ..." (I heard this example from a friend in a company which actively detects fake reviews).
Reviewer abnormal behaviors:
Public data available from Web sites, e.g., reviewer id, time of posting, frequency of posting, first reviewers of products, and many more. For example, do you see anything wrong with the reviews from this user,
Big John? What about after you see the reviews of these two users, Cletus and Jake? In fact, if you
browse the reviews of their reviewed products, you will find another
user/reviewer. This is just one example of atypical behaviors that our algorithms are able to discover.
Web site private/internal data (we have not used such data, but they are extremely useful), e.g., IP and MAC addresses, time taking to post a review, physical location of the reviewer, etc (a lot of them).
Product related features: E.g., product decription, sales volume, and sales rank
Relationships: Complex relationships among reviewers, reviews, and entities (e.g., products and stores).
I am doubtful that people can really spot fake reviews reliably (especially those well written ones). I have done experiments with 30+ students to show otherwise. One of the fallacies is that people usually think others would write like them or should write in certain ways.
Manipulating Social Media (sock puppets - fake identities - fake personas)
Amazon Product Review Data (Huge) used in (Jindal and Liu, WWW-2007; WSDM-2008; Lim et al, CIKM-2010; Jindal, Liu and Lim, CIKM-2010; Mukherjee et al. WWW-2011; Mukherjee, Liu and Glance, WWW-2012) for review spam (fake review) detection. It has information about reviewers, review text, ratings, product info, etc. Due to the large file size, you may need to use Download Accelerator Plus (DAP) to download. If you use this data, please cite (Jindal and Liu, WSDM-2008).
Jing Wang, Clement. T. Yu, Philip S. Yu, Bing Liu, Weiyi Meng. “Diversionary comments under blog posts." Accepted. ACM Transactions on the Web (TWEB), 2015.
Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu and Jidong Shao. "Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns." Short paper at ICWSM-2015, 2015.
Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. Spotting Opinion Spammers using Behavioral Footprints. To appear in Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2013), August 11-14 2013 in Chicago, USA.
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. What Yelp Fake Review Filter Might Be Doing. Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA.
Nitin Jindal and Bing Liu. "Opinion Spam and Analysis."Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008), Feb 11-12, 2008, Stanford University, Stanford, California, USA.
Three Reviews - Can you figure out which ones are fake?
I want to make this review in order to comment on the excellent service that my mother and I received on the Serenade of the Seas, a cruise line for Royal Caribbean. There was a lot of things to do in the morning and afternoon portion for the 7 days that we were on the ship. We went to 6 different islands and saw some amazing sites! It was definitely worth the effort of planning beforehand. The dinner service was 5 star for sure. One of our main waiters, Muhammad was one of the nicest people I have ever met. However, I am not one for clubbing, drinking, or gambling, so the nights were pretty slow for me because there was not much else to do. Either than that, I recommend the Serenade to anyone who is looking for excellent service, excellent food, and a week full of amazing day-activities!
This movie starring big names - Tom Hanks, Sandra Bullock, Viola Davis, and John Goodman - is one of the most emotionally endearing films of 2012. While some might argue that this film was "too Hollywood" and others might see the film solely because of the cast, it is Thomas Horn's performance as young Oskar that is deserving of awards. The story is about a 9-year-old boy on a journey to make sense of his father's tragic death in the 9/11 attacks on the World Trade Center. Oskar is a bright and nervous adventurer calmed only by the rattle of a tambourine in his ear. "I got tested once to see if I had Asperger's disease," the boy offers in explain of his odd behavior. "The tests weren't definitive." One year after the tragedy, Oskar finds a key in his father's closest and thus begins a quest to find the missing lock. Oskar's battle to control his emotional anxiety and form and mend relationships proves difficult, even with his mother. "If the sun were to explode, you wouldn't even know about it for eight minutes," Oskar narrates. "For eight minutes, the world would still be bright and it would still feel warm." Those fleeting eight minutes Oskar has left of his father make for two hours and nine minutes of Extremely Emotional and Incredibly Inspiring film. Leaving the theatre, emotionally drained, it is a wonder where a movie like this has been. We saw Fahrenheit 9/11 and United 93, but finally here is the story of a New York family's struggle to understand why on "the worst day" innocent people would die. I highly recommend this movie as a must see.
High Points: Guacamole burger was quite tall; clam chowder was tasty. The decor was pretty good, but not worth the downsides. Low Points: Noisy, noisy, noisy. The appetizers weren't very good at all. And the service kind of lagged. A cross between Las Vegas and Disney world, but on the cheesy side. This Cafe is a place where you eat inside a plastic rain forest. The walls are lined with fake trees, plants, and wildlife, including animatronic animals. A flowing waterfall makes sure that you won't hear the conversations of your neighbors without yelling. I could see it being fun for a child's birthday party (there were several that occurred during our meal), but not a place to go if you're looking for a good meal.
The answer is at the bottom of my homepage. If you get them right, please let me know your clues. You can click my name below to get to my homepage.