Web Data Mining

Exploring Hyperlinks, Contents and Usage Data

Bing Liu, Springer, December, 2006


Web mining aims to discover useful information or knowledge from the Web hyperlink structure, page content and usage log. Based on the primary kind of data used in the mining process, Web mining tasks are categorized into three main types: Web structure mining, Web content mining and Web usage mining.

The goal of this book is to present these tasks, and their essential algorithms. It is written for advanced undergraduate students, graduate students, researchers and development professionals in the field. No prior knowledge of data mining or statistics is assumed. In fact, the book covers the essential topics of data mining as well.

Order the Book

2007, 532 p., Hardcover
ISBN-10: 3-540-37881-2
ISBN-13: 978-3-540-37881-5

Here is a review of the book by Prof. Olfa Nasraoui, published in SIGKDD Explorations, Volume 10, Issue 2, 2009. (I found this on the Web).

If you need an evaluation copy for teaching a course, please drop me an email. I can connect you to the publisher to get a copy.


Table of Contents (Full Version)

  1. Introduction
  2. Association Rules and Sequential Patterns
  3. Supervised Learning
  4. Unsupervised Learning
  5. Partially Supervised Learning
  6. Information Retrieval and Web Search
  7. Link Analysis
  8. Web Crawling
  9. Structured Data Extraction: Wrapper Generation
  10. Information Integration
  11. Opinion Mining
  12. Web Usage Mining

Chapters 8 and 12 were written by Prof. Filippo Menczer and Prof. Bamshad Mobasher respectively. They are international experts in their fields. They were also so kind to revise the chapters numerous times to ensure that the chapters integrate well into the common framework of the whole book.

Lecture Slides

Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7,
Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12.

You may be interested in some talks below

  • ACM SIGKDD Inaugural Webcast: Web Content Mining
  • A tutorial given at WWW-05 and WISE-05
  • Slides of several talks given at Google, Yahoo, Microsoft and other places. Drop me an email if you want them.
  • Of course, the book covers a lot more topics and algorithms, and also more up-to-date.

Errata List


First Draft: by Bing Liu on Oct 15, 2006.