Web Data Mining

Exploring Hyperlinks, Contents, and Usage Data

Bing Liu, Second Edition, July 2011
First Edition, Dec 2006, Springer


Second Edition



First Edition
Web mining aims to discover useful knowledge from Web hyperlinks, page content and usage log. Based on the primary kind of data used in the mining process, Web mining tasks are categorized into three main types: Web structure mining, Web content mining and Web usage mining. This book consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated in breadth and in depth (the SVD matrix factorization algorithm of Simon Funk used in Netflix Prize Contest is described in detail).

What is new in the second edition? Most chapters have been updated. The major changes are in Chapter 11 and Chapter 12, which have been re-written and significantly expanded. When the first edition was written, opinion mining (Chapter 11) was still in its infancy. Since then, the research community has proposed many novel techniques to solve various aspects of the problem. To include the latest developments for the Web usage mining chapter (Chapter 12), the topics of recommender systems and collaborative filtering, query log mining, and computational advertising have been added. This new edition is thus considerably longer, from a total of 532 pages in the first edition to a total of 622 pages in this second edition.

Teaching and Learning: Although the book is titled "Web Data Mining", it also covers the key topics of data mining, information retrieval, and text mining. Thus, it is suitable for a data mining course, in which the students learn not only data mining, but also Web mining and text mining. The book is appropriate for advanced undergraduate students, graduate students, researchers and practioners in the field. No prior knowledge of data mining or machine learning is assumed.

Order the Second Edition

2011, 622 p., Hardcover
ISBN 978-3-642-19459-7

Order the First Edition

2007, 532 p., Hardcover
ISBN-10: 3-540-37881-2
ISBN-13: 978-3-540-37881-5

Here is a review of the book by Prof. Olfa Nasraoui, published in SIGKDD Explorations, Volume 10, Issue 2, 2009. (I found this on the Web).

If you need an evaluation copy for teaching a course, please drop me an email. I can connect you to the publisher to get a copy. You can also get it from this page.


Table of Contents (Full Version, Second Edition)

  1. Introduction
  2. Association Rules and Sequential Patterns
  3. Supervised Learning
  4. Unsupervised Learning
  5. Partially Supervised Learning
  6. Information Retrieval and Web Search
  7. Social Network Analysis
  8. Web Crawling
  9. Structured Data Extraction: Wrapper Generation
  10. Information Integration
  11. Opinion Mining and Sentiment Anlaysis
  12. Web Usage Mining

Chapters 8 was written by Prof. Filippo Menczer. Chapter 12 was mainly written by Prof. Bamshad Mobasher and Prof. Olfa Nasraoui (second edition) except the recommender systems section for which they also helped. Prof. Wee Sun Lee helped a great deal in the writing of Chapter 5. They are all international experts in their fields.

Lecture Slides

Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7,
Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12.

You may be interested in some talks below

Errata List of the First Edition


First Draft: by Bing Liu on Oct 15, 2006.