Time: MW 3-4:15pm
Location: TBH 180E
Instructor: Prof. Elena Zheleva
Office hours: Tue 3-4pm, SEO 1140
Data science is playing an increasingly important role in many aspects of our lives, from personalizing our online (and offline) experiences to helping us make sense of large, heterogeneous datasets in various domains. To have a holistic theoretical understanding of data science, researchers need to be familiar with relevant methodologies from computer science, statistics, economics, and physics. The assigned course readings will give an overview of research in this area, including topics in: 1) machine learning, 2) causal inference and dealing with bias, 3) network science, and 4) privacy and ethics. The course will place emphasis on network modeling, as many realistic data sources can be represented as networks of interconnected objects, such as social networks, citation networks, and the world wide web.
This is a seminar course. The goal of the course is to expose graduate students to state-of-the-art research on reasoning about network data. The class project plays a central role in the course, and it should be taken as an opportunity to connect your research area of interest to the course topics.
Machine learning for networks: statistical relational learning, collective classification, link prediction, network clustering, graphical models, stochastic blockmodels, entity resolution in networks, anomaly detection, graph identification (approx. 5 weeks)
Causal inference and bias: controlled and natural experiments, propensity scores and matching, causal inference modeling, structural equations, causal Bayesian models, interventions, different types of bias, peer and network effects, connecting predictive and causal modeling (approx. 4-5 weeks)
Network science: Random networks, evolving networks, scale-free property, modularity, centrality measures and communities, information diffusion models (approx. 1-2 weeks)
Privacy and ethics: known vulnerabilities and challenges, anonymization, differential privacy, personalized privacy assistants, fairness, accountable algorithms (approx. 2 weeks)
Paper summaries/discussion - 30%
Presentations - 20%
Course project - 50% (proposal, progress report, final presentation, final report)
CS 412 Machine Learning or equivalent; or consent of the instructor.
We will be using Piazza for all course discussions and materials. Students registered for the course will be sent an enrollment email before the first day of class.
|8/28||Introduction and syllabus|
|8/30||Predictive modeling in networks||"Link mining," L. Getoor, C. Diehl. SIGKDD Explorations 2005.||
Sign up for topics
Quiz on prerequisites
|9/4||Labor Day - No class|
|9/6||Graphical models||"Graphical models in a nutshell," D. Koller, N. Friedman, L. Getoor, B. Taskar. Book chapter in Introduction to Statistical Relational Learning, 2007.|
"Collective classification in network data," P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, T. Eliassi-Rad. AI Magazine 2008.
Optional: "Linearized and single-pass belief propagation," W. Gatterbauer, S. Gunnemann, D. Koutra, C. Faloutsos. VLDB 2015.
"Collective entity resolution in relational data," I. Bhattacharya, L. Getoor. SDM 2007.
Optional: "Large-scale collective entity matching," V. Rastogi, N. Dalvi, M. Garofalakis. VLDB 2011.
"The link prediction problem for social networks," D. Liben-Nowell, J. Kleinberg. CIKM 2003.
"Modeling relationship strength in online social networks," R. Xiang, J. Neville, M. Rogati. WWW 2010.
Optional: "Link prediction in relational data," B. Taskar, M. Wong, P. Abbeel, D. Koller. NIPS 2003.
|9/20||Network science||"The structure and function of complex networks", M. Newman, SIAM Review 2003.|
|9/25||Network properties and evolution||
"Graphs over time: Densification laws, shrinking diameters and possible explanations," J. Leskovec, J. Kleinberg, C. Faloutsos. KDD 2005.
"Co-evolution of social and affiliation networks," E. Zheleva, H. Sharara, L.Getoor. KDD 2009.
Optional: "Power-law distributions in empirical data," A. Clauset, C. Shalizi, M. Newman. SIAM Review 2009.
"What is Twitter, a social network or a news media," H. Kwak, C. Lee, H. Park, and S. Moon. WWW 2010.
"Shaping social activity by incentivizing users," M. Farajtabar, N. Du, M. Rodriguez, I. Valera, H. Zha, L. Song. NIPS 2014.
Optional: "Optimizing the effectiveness of incentivized social sharing," J. Pfeiffer, E. Zheleva. ASONAM 2017.
|Project proposal due|
|10/2||Network clustering and community finding||
"Finding and evaluating community structure in networks," M. Newman and M. Girvan. Phys. Rev. E, 2004.
"Efficient discovery of overlapping communities in massive networks," P. Gopalan, D. Blei. PNAS 2013.
Optional: "Defining and evaluating communities based on ground truth," J. Yang, J. Leskovec. ICDM 2012.
"Neighborhood formation and anomaly detection in bipartite graphs," J. Sun, H. Qu, D. Chakrabarti, C. Faloutsos. ICDM 2005.
"Scalable anomaly ranking of attributed neighborhoods," B. Perozzi, L. Akoglu. SDM 2016.
"Group recommendation: Semantics and efficiency," S. Amer-Yahia, S. Roy, A. Chawla, G. Das, C. Yu. VLDB 2009.
"Directed edge recommender system," I. Kotsogiannis, E. Zheleva, A. Machanavajjhala. WSDM 2017.
|10/11||Statistical relational learning||
"Probabilistic similarity logic," M. Broecheler, L. Mihalkova, L. Getoor. UAI 2010.
Optional: "Hinge-loss Markov Random Fields and probabilistic soft logic," S. Bach, M. Broecheler, B. Huang, L. Getoor. 2015.
|10/16||Causal inference||"Statistics and causal inference," P.Holland, JASA 1986.|
|10/18||Causal inference and ML||
"Causal inference in economics and marketing," H. Varian, NAS 2016.
"Prediction and explanation in social systems," J. Hofman, A. Sharma, D. Watts. Science 2017.
|10/23||Estimating causal effects||
"Recursive partitioning for heterogeneous causal effects," S. Athey, G. Imbens. NAS 2016.
Optional: "Estimating causal effects of treatments in randomized and nonrandomized studies," D. Rubin, Journal of Ed. Psychology 1974.
|10/25||Causality and networks||"Homophily and contagion are generically confounded in observational social network studies." C. Shalizi and A. Thomas. Sociological Methods and Research, 40, 2011.|
"Detecting network effects: Randomizing over randomized experiments," M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, W. Duan, S. Ghosh, Y. Xu, E. Airoldi. KDD 2017.
"Estimating peer effects in networks with peer encouragement designs," D. Eckles, R. Kizilcec, E. Bakshy. NAS 2016.
Optional: "Creating social contagion through firm-mediated message design: evidence from a randomized field experiment," T. Sun, S. Viswanathan, E. Zheleva. ICIS 2014.
"The role of social networks in information diffusion," E. Bakshy, I. Rosenn, C. Marlow, L. Adamic. WWW 2012.
|Progress report due|
|11/1||Observational studies: natural experiments and propensity scores||
"Online actions with offline impact: how online social networks influence online and offline user behavior," T. Althoff, P. Jindal, J. Leskovec. WSDM 2017.
"Distilling the outcomes of personal experiences: A propensity-scored analysis of social media," A. Olteanu, O. Varol, E. Kiciman. CSCW 2017.
Optional: "Reducing bias in observational studies using subclassification on the propensity score." P. Rosenbaum, D. Rubin. JASA 1984.
|11/6||Causal Bayesian Networks||
"Causal Bayesian Networks," Chapter 1.3, and
"Functional Causal Models," Chapter 1.4 in Causality by Judea Pearl.
|11/8||Causal Bayesian networks||
"Causal inference and the data-fusion problem," E. Bareinboim and J. Pearl, NAS 2016.
"Recovering from selection bias in causal and statistical inference," E. Bareinboim, J. Tian, J. Pearl. AAAI 2014.
|11/13||Causal Bayesian networks||"Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks," J. Mooij, J. Peters, D. Janzing, J. Zscheischler, B Scholkopf. JMLR 2016.|
|11/15||Bias and data science||
"Unbiased learning-to-rank with biased feedback," T. Joachims, A. Swaminathan, T. Schnabel. WSDM 2017.
Optional: "Social data: biases, methodological pitfalls, and ethical boundaries," A. Olteanu, C. Castillo, F. Diaz, E. Kiciman. 2016.
|11/20||Privacy and networks||
"Towards identity anonymization on graphs," M. Hay, G. Miklau, D. Jensen, D. Towsley, P. Weis. VLDB 2008.
"To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles," E. Zheleva, L. Getoor. WWW 2009.
Optional: "Analyzing graphs with node differential privacy," S. Kasiviswanathan, K. Nissim, S. Raskhodnikova, A. Smith. TCC 2013.
|11/22||Personalized privacy assistants||
"Privacy wizards for social networking sites," L. Fang, K. LeFevre. WWW 2010.
"Follow my recommendations: A personalized privacy assistant for mobile app permissions," B. Liu, M. Andersen, F. Schaub, H. Almuhimedi, S. Zhang, N. Sadeh, A. Acquisti, Y. Agarwal. SOUPS 2016.
Optional: "Making privacy personal: profiling social network uses to inform privacy education and nudging," P. Wisniewski, B. Knijnenburg, H. Lipford. IJHCS 2017.
|11/27||Fairness and accountable algorithms||
"Fair prediciton with disparate impact: a study of bias in recidivism prediction intruments," A. Chouldechova. Big Data 2017.
"FairTest: Inferring unwarranted associations in data-driven applications," F. Tramer et al. Euro S&P 2017.
Optional: "Fairness through awareness," C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel. ITCS 2012.
|12/11||Finals week||Final report due|