Web Data Mining

Exploring Hyperlinks, Contents, and Usage Data

Bing Liu, Second Edition, July 2011
First Edition, Dec 2006, Springer

Second Edition

First Edition

Web mining aims to discover useful knowledge from Web hyperlinks, page content and usage log. Based on the primary kind of data used in the mining process, Web mining tasks are categorized into three main types: Web structure mining, Web content mining and Web usage mining. This book consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated in breadth and in depth (the SVD matrix factorization algorithm of Simon Funk used in Netflix Prize Contest is described in detail).

What is new in the second edition? Most chapters have been updated. The major changes are in Chapter 11 and Chapter 12, which have been re-written and significantly expanded. When the first edition was written, opinion mining (Chapter 11) was still in its infancy. Since then, the research community has proposed many novel techniques to solve various aspects of the problem. To include the latest developments for the Web usage mining chapter (Chapter 12), the topics of recommender systems and collaborative filtering, query log mining, and computational advertising have been added. This new edition is thus considerably longer, from a total of 532 pages in the first edition to a total of 622 pages in this second edition.

Teaching and Learning: Although the book is titled "Web Data Mining", it also covers the key topics of data mining, information retrieval, and text mining. Thus, it is suitable for a data mining course, in which the students learn not only data mining, but also Web mining and text mining. The book is appropriate for advanced undergraduate students, graduate students, researchers and practioners in the field. No prior knowledge of data mining or machine learning is assumed.

Order the Second Edition

From Amazon.com
From Springer
Online version available from Springer

2011, 622 p., Hardcover
ISBN 978-3-642-19459-7

Order the First Edition

From Amazon.com.
From Barnes&Noble.
From Springer (see the new edition above).

2007, 532 p., Hardcover
ISBN-10: 3-540-37881-2
ISBN-13: 978-3-540-37881-5

Here is a review of the book by Prof. Olfa Nasraoui, published in SIGKDD Explorations, Volume 10, Issue 2, 2009. (I found this on the Web).

If you need an evaluation copy for teaching a course, please drop me an email. I can connect you to the publisher to get a copy. You can also get it from this page.

Table of Contents (Full Version, Second Edition)

Introduction
Association Rules and Sequential Patterns
Supervised Learning
Unsupervised Learning
Partially Supervised Learning
Information Retrieval and Web Search
Social Network Analysis
Web Crawling
Structured Data Extraction: Wrapper Generation
Information Integration
Opinion Mining and Sentiment Anlaysis
Web Usage Mining

Chapters 8 was written by Prof. Filippo Menczer. Chapter 12 was mainly written by Prof. Bamshad Mobasher and Prof. Olfa Nasraoui (second edition) except the recommender systems section for which they also helped. Prof. Wee Sun Lee helped a great deal in the writing of Chapter 5. They are all international experts in their fields.

Lecture Slides

Chapter 2, Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7,
Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12.

Errata List of the First Edition

Errata list of the first and second print.
- Many thanks to Shenghua Bao, Brian Davison, Juliana Freire, Po-Hsiu Lin, Olfa Nasraoui, Suhyuk Park, Guillermo Vazquez, Clement Yu and Yuri Zelenkov.
Your comments and errata are appreciated. Please drop me an email if you have any.

First Draft: by Bing Liu on Oct 15, 2006.