CS582 - Information Retrieval
Textbook
Principles of Query Processing for Advanced Database Applictions. By C. Yu and W. Meng, Morgan Kaufmann, 1998
Description
This course covers topics such as techniques of retrieval of text, picture, and video.
Lecture Notes
- Document & Query Representation: vector, stopping word, and stemming etc.
- Similarity (Q, D):
- Binary independency model.
- NonBinary independency model.
- Tree dependency model.
- Internet Environment: title, head, big fonts, anchor text; page rank, hub, and authority etc.
- Meta Search Engine (Meta-SE):
- Decide which SE (DB) to be used for a given query: optimal, heuristic, and learning methods etc.
- Merging process: min, max methods etc.
- Feedback:
- Rocchio method.
- Estimate (Binary/Nonbinary Independence Model).
- Perception
- Permanent learning.
- Pseudo Feedback: term correlation including phrase.
- Clustering:
- Graphic theoretical method.
- 1-pass (k-means, balanced tree).
- Adaptive clustering.
- Classification (Naive Bayes).
- Image Retrieval:
- Text.
- Image characteristics.
- Semantic (object, properties, and action and spatial sprelationships etc.).
- Video Retrieval: hierarchical, temporal.
Term Project
Form Information Extraction System - A lot of web sites use the interface consisting of an
HTML form to let users submit their queries. It will be helpful for us to perform automatic information collection if we can identify or recognize each field of these HTML forms and feed the user's query into the corresponding
input field. However, the fact that different web sites might use HTML forms with different layouts and compositions because of the unstructured nature and flexibility of HTML syntax makes the identification process not easy. We proposed a solution to the
form identifying by utilizing the structural and semantic information in this project, and based on this solution we construct the architecture with a user feedback mechanism to achieve our identification goal. (
final report)