CS 583 Spring 2019

CS 583 - Spring 2019 (two sections)

Data Mining and Text Mining

Course Objective

This course has three objectives. First, to provide students with a sound basis in data mining tasks and techniques. Second, to ensure that students are able to read, and critically evaluate data mining research papers. Third, to ensue that students are able to implement and to use some of the important data mining and text mining algorithms.

Think and Ask!

If you have questions about any topic or assignment, DO ASK me or even your classmates for help, I am here to make the course undersdood. DO NOT delay your questions. There is no such thing as a stupid question. The only obstacle to learning is laziness.

General Information

Instructor: Bing Liu
- Email: Bing Liu
- Tel: (312) 355 1318
- Office: North end, 3rd floor, library
Teaching Assistants:
- Section 1: Sahisnu Mazumder, sahisnumazumder@gmail.com
- Section 2: Shuai Wang, swang207@uic.edu;
- Office hours: by appointment

Section 1

Course Call Number: 25479
Lecture time slot:
- 12:30-1:45pm Tuesday & Thursday
Lecture hall: LC A7
Office hours: 2:00pm-3:00pm, Tuesday & Thursday (or by appointment)

Section 2

Course Call Number: 39840
Lecture time slot:
- 3:30-4:45pm Tuesday & Thursday
Lecture hall: LC A4
Office hours: 2:00pm-3:00pm, Tuesday & Thursday (or by appointment)

Grading

Quizzes: 10% (2 or more)

Date and time: TBA

Midterm: 30% (2 midterms)

Date and time: TBA

Final Exam: 40%

Date and time: TBA

Projects: Done in groups (2 students per group)

Project 1: TBA (10%) (algorithm implementation)
- Demo date: TBA
Project 2: TBA (10%) (research)
- Demo date: TBA
- Report due: TBA

Prerequisites

Knowledge of probability and algorithms
Any program language for projects

Teaching materials

Required Textbook:
- Web data Mining - Exploring Hyperlinks, Contents and Usage Data, By Bing Liu, Second Edition, Springer, July 2011, ISBN 978-3-642-19459-7
References
- Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, ISBN 1-55860-489-8.
- Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.
- Data Miining. by Charu Aggarwal, Springer, 2015. ISBN 978-3-319-14142-8
- Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7
- Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN 0-262-08290-X.
- Lifelong machine learning, by Zhiyuan Chen and Bing Liu, Morgan & Claypool Publishers, November 2016.
- Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, by Bing Liu, Cambridge University Press, 2015.
Data mining resource site: KDnuggets Directory

Topics (subject to change; the reading list follows each chapter title)

Introduction
Data pre-processing
- Data cleaning
- Data transformation
- Data reduction
- Discretization
Association rules and sequential patterns (Sections 2.1 - 2.7)
- Basic concepts
- Apriori Algorithm
- Mining association rules with multiple minimum supports
- Mining class association rules
- Sequetial pattern mining
- Summary
Supervised learning (Classification) (Chapter 3)
- Basic concepts
- Decision trees
- Classifier evaluation
- Rule induction
- Classification based on association rules
- Naive-Bayesian learning
- Naive-Bayesian learning for text classification
- Support vector machines
- K-nearest neighbor
- Bagging and boosting
- Summary
Unsupervised learning (Clustering) (Chapter 4)
- Basic concepts
- K-means algorithm
- Representation of clusters
- Hierarchical clustering
- Distance functions
- Data standardization
- Handling mixed attributes
- Which clustering algorithm to use?
- Cluster evaluation
- Discovering holes and data regions
- Summary
Information retrieval and Web search (Sections 6.1 - 6.6, and 6.8)
- Basic text processing and representation
- Cosine similarity
- Relevance feedback and Rocchio algorithm
Semi-supervised learning (Sections 5.1.1, 5.1.2, 5.2.1 - 5.2.4)
- LU learning: Learning from labeled and unlabeled examples
  - Learning from labeled and unlabeled examples using EM
  - Learning from labeled and unlabeled examples using co-training
- PU learning: Learning from positive and unlabeled examples
Social network analysis (Sections 7.1 - 7.4)
- Centrality and prestige
- Citation analysis: co-citation and bibliographic coupling
- The PageRank algoithm (of Google)
- The HITS algorithm: authorities and hubs
- Mining communities on the Web
Sentiment analysis and opinion mining (Sections 11.1 - 11.6; check out my two books)
- Opinion mining problem
- Document-level Sentiment classification
- Sentence-level subjectivity and sentiment classification
- Aspect-level sentiment analysis
- Mining comparative opinions
- Opinion lexicon generation
Recommender systems and collaborative filtering (Section 12.4)
- Content-based recommendation
- Collaborative filtering based recommendation
  - K-nearest neighbor
  - Association rules
  - Matrix factorization
Web data extraction (Sections 9.1 and 9.2)
- Wrapper induction
- Automated extraction
Lifelong machine learning (chapter 1; sections 2.1, 2.2, 3.1, 3.2, 3.4, 4.4, 4.5; chapter 5)
- What is lifelong machine learning?
- Lifelong supervised learning
- Lifelong unsupervised learning
- Lifelong semi-supervised learning

Projects - graded (you will demo your programs to me)

Each group consists of 2 students, and will work on two assignments
1. Algorithm implementation: TBA
2. Research: TBA

Rules and Policies

Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.
Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students' work (this includes, exams and program) will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.
MOSS: Sharing code with your classmates is not acceptable!!! All programs will be screened using the Moss (Measure of Software Similarity.) system.
Late assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. If a late assignment is accepted it is subject to a reduction in score as a late penalty.

Back to Home Page

By Bing Liu, Jan 15, 2018