CS 418 Introduction to Data Science
University of Illinois at Chicago, Spring 2022


Lecture time: TTh 12:30-1:45pm

Location: First two weeks online, SES 201


Instructor: Prof. Elena Zheleva

Office hours: TBD

Contact:


Graduate TA: Shishir Adhikari

Office hours: TBD


Graduate TA 2: Zahra Fatemi

Office hours: TBD

"Our ability to collect, manipulate, analyze, and act on vast amounts of data is having a profound impact on all aspects of society. This transformation has led to the emergence of data science as a new discipline. The explosive growth of interest in this area has been driven by research in social, natural, and physical sciences with access to data at an unprecedented scale and variety, by industry assembling huge amounts of operational and behavioral information to create new services and sources of revenue, and by government, social services and non-profits leveraging data for social good. This emerging discipline relies on a novel mix of mathematical and statistical modeling, computational thinking and methods, data representation and management, and domain expertise."
--Committee on Data Science, Computing Research Association

Course description

This course provides an in-depth overview of data science from a computer science perspective. Topics include modeling, storage, manipulation, integration, classification, analysis, visualization, information extraction, and big data. The course is programming-intensive and an emphasis will be placed on tying data science concepts to specific real-world applications through hands-on experience.

Prerequisites

Working knowledge of probability, data structures and algorithms, and ability to (learn to) program in Python.

Course materials

We will use Piazza for the course schedule, discussions, and materials, and Gradescope for grading.
Python is the programming language used for homework assignments.

Textbooks

No textbook is required. Readings will be assigned, using multiple online sources, including:
[PTDS] Principles and techniques of data science. Lau, Gonzalez, Nolan.
[MMD] Mining of massive datatasets. Leskovec, Rajaraman, Ullman.
[FDV] Fundamentals of data visualization. Wilke.
[CIT] Computational and Inferential thinking. Adhikari, DeNero.
[CIML] A course in machine learning [Errata]. Hal Daume III.