Partially Supervised Classification

PU Learning - Learning from Positive and Unlabeled Examples


New Book: Web Data Mining - Exploring Hyperlinks, Contents and Usage Data

Funded by: NSF (National Science Fundation), Award No: IIS-0307239

To our knowledge, the term PU Learning was coined in our 2005 paper. It stands for positive and unlabeled learning, also called learning from positive and unlabeled examples. Our first paper on PU learning was published in ICML-2002, which focused on text classification. Note that Set Expansion is basically an instance of PU learning.

Definition (PU Learning): Given a set of examples of an particular class P (called the positive class) and a set of unlabeled examples U, which contains both class P and non-class P (called the negative class) instances, the goal is to build a binary classifier to classify the test set T into two classes, positive and negative, where T can be U.

Our ICML-2002 paper showed theoretically that P and U provide sufficient information for learning, and PU learning can be posed as a constrained optimization problem. Some of our early algorithms are reported in (Liu et al 2003), (Lee and Liu 2003), (Li and Liu 2003), etc.

Read the following paper first: It summarizes the main early ideas, proposed a biased-SVM technique, and performed a comprehensive evaluation.

Publications

  1. Geli Fei and Bing Liu. "Social Media Text Classification under Negative Covariate Shift." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisboa, Portugal, 17-21 September 2015.

  2. Xiaoli Li, Lei Zhang, Bing Liu and See-Kiong Ng. "Distributional Similarity vs. PU Learning for Entity Set Expansion." In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10, short paper) , July 11-16, 2010.

  3. Xiaoli Li, Bing Liu and See-Kiong Ng. Negative Training Data can be Harmful to Text Classification". Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-10). Oct. 9-11, 2010, MIT, Massachusetts, USA.

  4. Xiaoli Li, Bing Liu and See-Kiong Ng. "Learning to Identify Unexpected Instances in the Test Set," Proceedings of Twenth International Joint Conference on Artificial Intelligence (IJCAI-07), 2007. [PDF]

  5. Xiaoli Li, Bing Liu. "Learning from Positive and Unlabeled Examples with Different Data Distributions." European Conference on Machine Learning (ECML-05), 2005. [PDF]

  6. Bing Liu Xiaoli Li, Wee Sun Lee and and Philip Yu. "Text Classification by Labeling Words." Proceedings of The Nineteenth National Conference on Artificial Intelligence (AAAI-2004), July 25-29, 2004, San Jose, California. [PDF]

  7. Xiaoli Li, and Bing Liu. "Dealing with Different Distributions in Learning from Positive and Unlabeled Web Data." WWW-2004 poster paper. [PDF]

  8. Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu. "Semi-supervised Text Classification Using Partitioned EM." DASFAA 2004: 482-493. [PDF]

  9. Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee and and Philip Yu. "Building Text Classifiers Using Positive and Unlabeled Examples." Proceedings of the Third IEEE International Conference on Data Mining (ICDM-03), Melbourne, Florida, November 19-22, 2003. [PDF]

  10. Xiaoli Li, Bing Liu. Learning to classify text using positive and unlabeled data. Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), Aug 9-15, 2003, Acapulco, Mexico.

  11. Wee Sun Lee, Bing Liu. Learning with Positive and Unlabeled Examples using Weighted Logistic Regression. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), August 21-24, 2003, Washington, DC USA.

  12. Bing Liu, Wee Sun Lee, Philip S Yu and Xiaoli Li. Partially Supervised Classification of Text Documents. Proceedings of the Nineteenth International Conference on Mach ine Learning (ICML-2002), 8-12, July 2002, Sydney, Australia.

Software


NSF Grant Report, Aug 5, 2003. IDM 2003 Workshop, September 14-16, 2003, Seattle, Washington.


Acknowledgments

This project is currently suported by National Science Foundation under Grant No. IIS-0307239. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Created on July 20, 2003 by Bing Liu.