Projects related to Responsible Data Science and Algorithmic Fairness
Big data technologies have affected every corner of human life and society. These technologies have made our lives unimaginably more shared, connected, convenient, and cost-effective. Using data-driven technologies gives us the ability to make wiser decisions, and can help make society safer, more equitable and just, and more prosperous.
On the other hand, even if it looks promising, data-driven decision making can cause harm.
Probably the main reason is that real-life social date is almost always ``biased''. No one can miss the extensive recent discussion about race in the context of policing and criminal justice. But similar questions arise in many other domains as well.
Take college admission for example. It has been shown that the GPA values has gender bias. That is due to grading policies that, for instance, may reduce grades for students with late homework, disruptive behavior, or inattention. As a result, using GPA as one of the features for generating the scores and ranking the students without considering the inherent bias in data can lead to gender bias.
Evidence of bias has also been reported in recommendaton systems, advertisement, job interviewing, hiring, and promotion, among others.
In order to minimize societal harms of data-driven technologies, and to ensure that objectives such as fairness, equity, diversity, robustness, accountability, and transparency are satisfied, we aim to develop proper algorithms, tools, strategies, and metrics.
In particular, we divide our effort in three categories:
- Data Prepration & Investigation:
- The focus of this category is on bias in data (colored in purple in the figure).
Social data is almost always biased as it inherently reflects historical biases and stereotypes. Data collection and representation methods often introduce additional bias.
Using biased data without paying attention to societal impacts can create a feedback loop, and even increase discrimination in society.
Projects in this category aim to investigate the data used for building models and algorithms, (i) identify bias, and (iii) mitigate bias, and (iii) annotate data with information that show their fitness for use.
- Algorithm & Model Design:
- In particular, our focus is on score-based evaluation (colored in pink in the figure).
The scores are often derived by combining multiple criteria (aka features or attributes).
For instance, a lender may combine attributes such as payment history, salary, education, and age to develop a creditworthiness score for each customer.
The scores can be generated with different methods, linearly or using a complex function, and be used for different purposes.
In classification, scores are used to draw a decision boundary to specify, for example, if a woman is at risk of developing invasive breast cancer over the next 5 years.
In ranking, the scores are used to sort the entities and, for example, select the top-8 soccer teams for seeding pot 1 in the world cup tournament.
The scores are usually assigned either through (i) a process learned by machine learning models using some labeled training data, or (ii) using a weight vector or a procedure designed by human experts.
Our objective in these set of proects is to mitigate the bias in scoring algorithms to generate fair and stable outcomes.
- Data Presentation & Output Investigation:
The projects in this category (i) provide tools for investigating and mitigating bias in the outcome of algorithms (ii) study how the data presentation can introduce bias.
Below, you can find more details about each of the projects. You can also refer to the following publications for more details.
- (Blog Post) Abolfazl Asudeh. Enabling Responsible Data Science in Practice. ACM SIGMOD Blog, Jan. 2021.
- (Tutorial) Abolfazl Asudeh, HV Jagadish. Fairly Evaluating and Scoring Items in a Data Set. PVLDB, 2020, VLDB Endowment.
- Presentation videos and slides here (strongly recommended to watch)
- (Invited Paper) Abolfazl Asudeh, HV Jagadish, Julia Stoyanovich. Towards Responsible Data-driven Decision Making in Score-Based Systems. Data Engineering Bulletin, Vol. 42(3), pages 76--87, 2019, Special Issue on Fairness, Diversity, and Transparency in Data Systems.