Lecture time: MW 4:30-5:45pm
Location: TBH 180F
Instructor: Prof. Elena Zheleva
Office hours: Tue 3-5pm, SEO 1140
Graduate TA: Shishir Adhikari
Office hours: TBD
"Our ability to collect, manipulate, analyze, and act on vast amounts of data is having a profound impact on all aspects of society. This transformation has led to the emergence of data science as a new discipline. The explosive growth of interest in this area has been driven by research in social, natural, and physical sciences with access to data at an unprecedented scale and variety, by industry assembling huge amounts of operational and behavioral information to create new services and sources of revenue, and by government, social services and non-profits leveraging data for social good. This emerging discipline relies on a novel mix of mathematical and statistical modeling, computational thinking and methods, data representation and management, and domain expertise."
--Committee on Data Science, Computing Research Association
This course provides an in-depth overview of data science from a computer science perspective. Topics include modeling, storage, manipulation, integration, classification, analysis, visualization, information extraction, and big data. The course is programming-intensive and an emphasis will be placed on tying data science concepts to specific real-world applications through hands-on experience.
Working knowledge of probability, data structures and algorithms, and ability to (learn to) program in Python.
Programming-based homework assignments - 30%
Midterm exam - 20%
Bi-weekly quizzes - 15%
Class project - 35%
No textbook is required. Readings will be assigned, using multiple online sources, including:
[PTDS] Principles and techniques of data science. Lau, Gonzalez, Nolan.
[MMD] Mining of massive datatasets. Leskovec, Rajaraman, Ullman.
[FDV] Fundamentals of data visualization. Wilke.
[CIT] Computational and Inferential thinking. Adhikari, DeNero.
[CIML] A course in machine learning [Errata]. Hal Daume III.