In Web search, the user first issues a search query and the search engine returns a list of ranked pages. The user then browses some top ranked pages to find what s/he is interested in. This classic paradigm is sufficient if one wants to find a specific piece of information, e.g., the homepage of a person, or the pdf file of a research paper. If the user is interested in an open-ended exploration or the complete information about a search topic, it leaves much more to be desired. It will be very useful if the system can combine individual pieces of information from multiple pages to form a coherent whole/picture of the search query. We call this kind of search Deep Search, and the process of discovering and integrating different pieces of information, knowledge or information synthesis.
Abstract: Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Web's convenience and its richness of information. In many cases, learning from the Web may even be essential because in our fast changing world, emerging topics appear constantly and rapidly. There is often not enough time for someone to write a book on such topics. To learn such emerging topics, one can resort to research papers. However, research papers are often hard to understand by non-researchers, and few research papers cover every aspect of the topic. In contrast, many Web pages often contain intuitive descriptions of the topic. To find such Web pages, one typically uses a search engine. However, current search techniques are not designed for in-depth learning. Top ranking pages from a search engine may not contain any description of the topic. Even if they do, the description is usually incomplete since it is unlikely that the owner of the page has good knowledge of every aspect of the topic. In this research, we attempt a novel and challenging task, mining topic-specific knowledge on the Web. Our goal is to help people learn in-depth knowledge of a topic systematically on the Web. The proposed techniques first identify those sub-topics or salient concepts of the topic, and then find and organize those informative pages, containing definitions and descriptions of the topic and sub-topics, just like those in a traditional book or a good survey article. In other words, it tries to generate a "book" or a "survey article" on the topic from the information on the Web, i.e., with a table of contents of sub-topics, and each sub-topic points to its sub-subtopics and content (detailed description) pages. Our initial work is described in the following paper.
Note: The original title was "Mining a book on the Web," but the reviewers did not like it and asked us to change. We do not like the above new title.
Created on Feb 5, 2004 by Bing Liu .