Web Data Mining

Exploring Hyperlinks, Contents and Usage Data

Bing Liu, Springer, December, 2006


Web mining aims to discover useful information or knowledge from the Web hyperlink structure, page content and usage log. Based on the primary kind of data used in the mining process, Web mining tasks are categorized into three main types: Web structure mining, Web content mining and Web usage mining.

The goal of this book is to present these tasks, and their essential algorithms. It is written for advanced undergraduate students, graduate students, researchers and development professionals in the field. No prior knowledge of data mining or statistics is assumed. In fact, the book covers the essential topics of data mining as well.

Order the Book

2007, 532 p., Hardcover
ISBN-10: 3-540-37881-2
ISBN-13: 978-3-540-37881-5

If you need an evaluation copy for teaching a course, please drop me an email. I can connect you to the editor to get a copy quickly.


Table of Contents (Full Version)

  1. Introduction
  2. Association Rules and Sequential Patterns
  3. Supervised Learning
  4. Unsupervised Learning
  5. Partially Supervised Learning
  6. Information Retrieval and Web Search
  7. Link Analysis
  8. Web Crawling
  9. Structured Data Extraction: Wrapper Generation
  10. Information Integration
  11. Opinion Mining
  12. Web Usage Mining

Lecture Slides

Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7,
Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12.

Still working on Chapter 12 (slides of Chapter 6 only covers a small portion of the chapter in the book).

You may be interested in some talks below

  • ACM SIGKDD Inaugural Webcast: Web Content Mining
  • A tutorial given at WWW-05 and WISE-05
  • Slides of several talks given at Google, Yahoo, Microsoft and other places. Drop me an email if you want them.
  • Of course, the book covers a lot more topics and algorithms, and also more up-to-date.

Errata List


First Draft: by Bing Liu on Oct 15, 2006.