CS 583 Fall 2021
CS 583 - Fall 2021
Data Mining and Text Mining
Course Objective
This course has three objectives. First, to provide students with a solid background in the classic data mining and machine learning techniques and to introduce the latest research topics (e.g., out-of-distribution (novelty) detection, learning after model deployment, and lifelong/continual learning). Second, to ensure that students are able to read, and critically evaluate data mining research papers. Third, to ensue that students are able to implement and to use some of the important data mining and text mining algorithms.
Think and Ask! If you have questions about any topic or assignment, DO ASK me, TA or
your classmates for help, I am here to make the course
understood. DO NOT delay your questions. There is no such thing as a
stupid question. The only obstacle to learning is laziness.
General Information
- Instructor: Bing Liu
- Email: Bing Liu
- Office: CS 3190c, North End, 3rd Floor, Library
- Teaching Assistant (TA)
- Sections 1 and 2: Sepideh Esmaeilpourcharandabi
- Email: sesmae2@uic.edu
|
Section 1
- Course Call Number: 30286
- Lecture time slots: 12:30 - 1:45pm Tue & Thu
- Lecture hall: LH 312
- Instructor office hours: 10:30am-12:00am Tue
- TA office hour: 11am - 12pm on Wed (CS offices in Library - North end, third floor)
|
Section 2
- Course Call Number: 45283
- Lecture time slots: 2:00pm - 3:15pm Tue & Thu
- Lecture hall: LH 312
- Instructor office hours: 10:30am - 12:00am Tue
- TA office hour: 11am - 12pm on Wed (CS offices in Library -- North end, third floor)
|
Grading
Final Exam: TBA
Midterm: TBA
Quizzes: TBA
Assignments: TBA
- Programming assignments: TBA
- A small text mining research project: TBA
Assignments and the research project are done in groups of 2. Discussions with
other students are allowed, but each group has to write your own code.
- Grading: live demo + code submission
- MOSS: Sharing code with your classmates is not acceptable!!! All programs will be screened using the Moss (Measure of Software Similarity.) system.
Prerequisites
- Knowledge of probability and algorithms
- Any program language for projects
Teaching materials
- Required Textbooks:
- Web data
Mining - Exploring Hyperlinks, Contents and Usage Data, By Bing Liu, Second Edition, Springer, July 2011, ISBN 978-3-642-19459-7
- Lifelong machine learning, by Zhiyuan Chen and Bing Liu, Morgan & Claypool Publishers, 2018 (second edition).
- Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, by Bing Liu, Cambridge University Press, 2020 (second edition).
- References
- Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, ISBN 1-55860-489-8.
- Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.
- Data Miining. by Charu Aggarwal, Springer, 2015. ISBN 978-3-319-14142-8
- Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7
- Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth, The MIT Press, ISBN 0-262-08290-X.
- Data mining resource site: KDnuggets Directory
Topics (subject to change; the reading list follows each chapter title)
- Introduction
- Data pre-processing
- Data cleaning
- Data transformation
- Data reduction
- Discretization
- Association rules and sequential patterns (Sections 2.1 - 2.7)
- Apriori Algorithm
- Mining association rules with multiple minimum supports
- Mining class association rules
- Sequetial pattern mining
- Summary
- Supervised learning (Classification) (Chapter 3)
- Decision tree induction
- Classifier evaluation
- Rule induction and lassification based on association rules
- Naive-Bayesian learning
- Naive-Bayesian learning for text classification
- Support vector machines
- K-nearest neighbor
- Bagging and boosting
- Summary
- Unsupervised learning (Clustering) (Chapter 4)
- K-means algorithm
- Representation of clusters
- Hierarchical clustering
- Distance functions and data standardization
- Cluster evaluation
- Discovering holes and data regions
- Summary
- Semi-supervised learning (Sections 5.1.1, 5.1.2, 5.2.1 - 5.2.4)
- LU learning: Learning from labeled and unlabeled examples
- PU learning: Learning from positive and unlabeled examples
- Novelty (or out-of-distribution) detection
- Lifelong and continual learning (the Lifelong Machine Learning book and research papers)
- Introduction to lifelong/continual learning
- Class and task continual learning
- Open-world learning
- Learning after model deployment (on-the-job learning)
- Recommender systems
- Content-based recommendation
- Collaborative filtering based recommendation
- K-nearest neighbor
- Association rules
- Matrix factorization
- Introduction to Information retrieval and Web search (Sections 6.1 - 6.6, and 6.8)
- Information retrievel models
- Basic text processing and representation
- Cosine similarity
- Relevance feedback and Rocchio algorithm
- Social network analysis (Sections 7.1 - 7.4)
- Centrality and prestige
- Citation analysis: co-citation and bibliographic coupling
- The PageRank algoithm (of Google)
- The HITS algorithm: authorities and hubs
- Sentiment analysis and opinion mining (Sections 11.1 - 11.6; check out my two books)
- Sentiment analysis and emotion analysis problems
- Document-level Sentiment classification
- Sentence-level subjectivity and sentiment classification
- Aspect-level sentiment analysis
- Mining comparative opinions
- Sentiment lexicon generation
- Continuous learning dialogue systems
- Introduction to dialogue systems
- Continual learning in intent classification
- Continuous learning of new language expressions
- Continuous learning of new world knowledge
Rules and Policies
- Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.
- Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students' work (this includes, exams and program) will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.
- Late submission: Late submission of assignment or quiz will not be accepted unless it is due to some extraordinary circumstances.
UIC Conunseling Center
We value your mental health and emotional wellness as part of the UIC student experience. The UIC Counseling Center offers an array of services to provide additional support throughout your time at UIC, including workshops, peer support groups, counseling, self-help tools, and initial consultations to speak to a mental health counselor about your concerns. Please visit the Counseling Center website for more information (https://counseling.uic.edu/). Further, if you think emotional concerns may be impacting your academic success, please contact your faculty and academic advisers to create a plan to stay on track.
My Home Page
By Bing Liu, Aug 15, 2020