2025-06-03 - Paper accepted at KDD 2025
Repairing Labeling Functions based on Small Sets of Labeled Examples
Congrats to Chenjie on his accepted KDD paper. In this work [1] in collaboration with Sudeepa, Amir, and Zhengjie we use small sets of labeled examples to improve the performance of labeling functions in programmatic weak supervision.
Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends directly on the accuracy of the LFs. To improve the quality of a given set of LFs, we study the problem of fixing LFs based on a small set of labeled examples. Towards this goal, we developed novel techniques for repairing a set of LFs by minimally changing their results on the labeled examples such that the fixed LFs ensure that (i) there is sufficient evidence for the correct label of each labeled datapoint and (ii) the accuracy of each repaired LF is sufficiently high. We model LFs as conditional rules, which enables us to refine them, i.e., to selectively change their output for some inputs. We demonstrated that our system improves the quality of LFs, no matter whether they are developed by a human expert or generated automatically by tools like Witan, based on surprisingly small sets of labeled datapoints.
-
Refining Labeling Functions With Limited Labeled Data
Chenjie Li, Amir Gilad, Boris Glavic, Zhengjie Miao and Sudeepa Roy
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2025).@inproceedings{LG25, author = {Li, Chenjie and Gilad, Amir and Glavic, Boris and Miao, Zhengjie and Roy, Sudeepa}, title = {Refining Labeling Functions With Limited Labeled Data}, booktitle = {{ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}}, year = {2025}, pdfurl = {http://www.cs.uic.edu/%7ebglavic/dbgroup/assets/pdfpubls/LG25.pdf}, istoappear = true, venueshort = {{KDD}}, longversionurl = {https://arxiv.org/pdf/2505.23470}, keywords = {Programmatic Weak Supervision} }