CS 583 Spring 2019

CS 583 - Spring 2019 (two sections)

Data Mining and Text Mining

Course Objective

This course has three objectives. First, to provide students with a sound basis in data mining tasks and techniques. Second, to ensure that students are able to read, and critically evaluate data mining research papers. Third, to ensue that students are able to implement and to use some of the important data mining and text mining algorithms.

Think and Ask!

If you have questions about any topic or assignment, DO ASK me or even your classmates for help, I am here to make the course undersdood. DO NOT delay your questions. There is no such thing as a stupid question. The only obstacle to learning is laziness.

General Information

  • Instructor: Bing Liu
    • Email: Bing Liu
    • Tel: (312) 355 1318
    • Office: North end, 3rd floor, library
  • Teaching Assistants:
    • Section 1: Sahisnu Mazumder, sahisnumazumder@gmail.com
    • Section 2: Shuai Wang, swang207@uic.edu;
    • Office hours: by appointment

Section 1

  • Course Call Number: 25479
  • Lecture time slot:
    • 12:30-1:45pm Tuesday & Thursday
  • Lecture hall: LC A7
  • Office hours: 2:00pm-3:00pm, Tuesday & Thursday (or by appointment)

Section 2

  • Course Call Number: 39840
  • Lecture time slot:
    • 3:30-4:45pm Tuesday & Thursday
  • Lecture hall: LC A4
  • Office hours: 2:00pm-3:00pm, Tuesday & Thursday (or by appointment)

Grading

  • Quizzes: 10% (2 or more)
  • Midterm: 30% (2 midterms)
  • Final Exam: 40%
  • Projects: Done in groups (2 students per group)

    Prerequisites

    Teaching materials

    Topics (subject to change; the reading list follows each chapter title)

    1. Introduction
    2. Data pre-processing
      • Data cleaning
      • Data transformation
      • Data reduction
      • Discretization
    3. Association rules and sequential patterns (Sections 2.1 - 2.7)
      • Basic concepts
      • Apriori Algorithm
      • Mining association rules with multiple minimum supports
      • Mining class association rules
      • Sequetial pattern mining
      • Summary
    4. Supervised learning (Classification) (Chapter 3)
      • Basic concepts
      • Decision trees
      • Classifier evaluation
      • Rule induction
      • Classification based on association rules
      • Naive-Bayesian learning
      • Naive-Bayesian learning for text classification
      • Support vector machines
      • K-nearest neighbor
      • Bagging and boosting
      • Summary
    5. Unsupervised learning (Clustering) (Chapter 4)
      • Basic concepts
      • K-means algorithm
      • Representation of clusters
      • Hierarchical clustering
      • Distance functions
      • Data standardization
      • Handling mixed attributes
      • Which clustering algorithm to use?
      • Cluster evaluation
      • Discovering holes and data regions
      • Summary
    6. Information retrieval and Web search (Sections 6.1 - 6.6, and 6.8)
      • Basic text processing and representation
      • Cosine similarity
      • Relevance feedback and Rocchio algorithm
    7. Semi-supervised learning (Sections 5.1.1, 5.1.2, 5.2.1 - 5.2.4)
      • LU learning: Learning from labeled and unlabeled examples
        • Learning from labeled and unlabeled examples using EM
        • Learning from labeled and unlabeled examples using co-training
      • PU learning: Learning from positive and unlabeled examples
    8. Social network analysis (Sections 7.1 - 7.4)
      • Centrality and prestige
      • Citation analysis: co-citation and bibliographic coupling
      • The PageRank algoithm (of Google)
      • The HITS algorithm: authorities and hubs
      • Mining communities on the Web
    9. Sentiment analysis and opinion mining (Sections 11.1 - 11.6; check out my two books)
      • Opinion mining problem
      • Document-level Sentiment classification
      • Sentence-level subjectivity and sentiment classification
      • Aspect-level sentiment analysis
      • Mining comparative opinions
      • Opinion lexicon generation
    10. Recommender systems and collaborative filtering (Section 12.4)
      • Content-based recommendation
      • Collaborative filtering based recommendation
        • K-nearest neighbor
        • Association rules
        • Matrix factorization
    11. Web data extraction (Sections 9.1 and 9.2)
      • Wrapper induction
      • Automated extraction
    12. Lifelong machine learning (chapter 1; sections 2.1, 2.2, 3.1, 3.2, 3.4, 4.4, 4.5; chapter 5)
      • What is lifelong machine learning?
      • Lifelong supervised learning
      • Lifelong unsupervised learning
      • Lifelong semi-supervised learning

    Projects - graded (you will demo your programs to me)


    Rules and Policies


    Back to Home Page
    By Bing Liu, Jan 15, 2018