CS 594 Fall 2003

CS 594 Fall 2003 (Under Construction ....)

Data Mining and Text Mining


Course Objective and Organization

This course has three objectives. First, to provide students with a sound basis in data mining tasks and techniques. Second, to ensure that students are able to read, present and critically evaluate data mining research papers. Third, to ensue that students are able to implement and to use some of the important data mining and text mining algorithms.

This course is organized in nine (9) sections, which cover all the main topics of data mining and text mining. For each topic, the instructor will first give a few introductory lectures first. Then, the class will discuss some research papers. Each class discussion starts with a paper presentation (in a seminar format) by a student assigned to read and present the paper. Two programming assignments will be given to ensure that students are able to implement and use some important data mining techniques.

General Information

Final Exam

Grading

Prerequisites

Teaching materials

Topics (subject to change)

  1. Introduction
  2. Data pre-processing: data cleaning, transformation, feature selection and discretization Slides
  3. Association rule mining Slides
  4. Classification (supervised learning) Slides
  5. Clustering (unsupervised learning) Slides
  6. Post-processing: Are all the data mining results interesting? Slides
  7. Text mining Slides
  8. Partially supervised learning Part-I slides
    • Learning with a small set of labeled and a large set of unlabeled data
    • learning with positive and unlabeled data
    • experiment with the LPU system.
  9. Introduction to Web mining: Search, information extraction and integration, Web log mining, personalization and recommendation.
  10. Summary

Programming Projects - graded (you will demo your program to me)

For both programming projects, you should use exactly the same file format as C4.5 and CBA.

  1. Implement a naive bayesian classifier:
  2. Implement the k-mean algorithm

Paper presentation - graded


Rules and Policies

Back to Home Page.
By Bing Liu, Aug 4, 2003.