CS 583 Spring 2005

CS 583 - Spring 2005 (Under Construction ....)

Data Mining and Text Mining


Course Objective and Organization

This course has three objectives. First, to provide students with a sound basis in data mining tasks and techniques. Second, to ensure that students are able to read, present and critically evaluate data mining research papers. Third, to ensue that students are able to implement and to use some of the important data mining and text mining algorithms.

This course is organized in nine (9) sections, which cover all the main topics of data mining and text mining. For each topic, the instructor will first give a few introductory lectures first. Then, the class will discuss some research papers. Each class discussion starts with a paper presentation (in a seminar format) by a student assigned to read and present the paper. Two programming assignments will be given to ensure that students are able to implement and use some important data mining techniques.

Think and Ask!

If you have questions about any topic or assignment, DO ASK me or even your classmates for help, I am here to make the course undersdood. DO NOT delay your questions. There is no such thing as a stupid question. The only obstacle to learning is laziness.

General Information

Grading

Prerequisites

Teaching materials

Topics (subject to change)

  1. Introduction Slides
  2. Data pre-processing: data cleaning, transformation, feature selection and discretization Slides
  3. Association rule mining Slides
  4. Classification (supervised learning) Slides
  5. Clustering (unsupervised learning) Slides
  6. Post-processing: Are all the data mining results interesting? Slides
  7. Text mining Slides
  8. Partially supervised learning Slides
    • Learning with a small set of labeled and a large set of unlabeled data
    • learning with positive and unlabeled data
    • experiment with the LPU system.
  9. Introduction to Web mining: Search, information extraction and integration, Web log mining, personalization and recommendation.
  10. Summary

Programming Projects - graded (you will demo both programs to me at the same time)

For both programming projects, you should use exactly the same file format as C4.5 and CBA.

  1. Implement a naive bayesian classifier:
  2. Implement the k-mean algorithm

Paper presentation - graded


Rules and Policies

Back to Home Page.
By Bing Liu, Dec 8 2004.