Open-World Machine Learning and Classification
(Open-world Recognition, Open Set Recognition, Open-world AI)
A form of Lifelong Learning
"Lifelong Machine Learning."
by Z. Chen and B. Liu, Morgan & Claypool, August 2018 (1st edition, 2016)
- Three new chapters have been added and others have been updated and/or reorganized.
- One Chapter is dedicated to Open World Learning
- Any AI system (e.g., chatbot and self-driving car) that cannot learn in deployment (e.g., chatting and driving) in the real-world open envoronment is not truly intelligent.
Learning on the Job in the Open World. Invited talk given at the Continual Learning Workshop @ ICML-2020, July 17, 2020.
Motivation: Sooner or later, AI agents will need to explore and learn
by themsleves in the real world. They cannot forever depend
on manually labeled data. The real world is open and dynamic, and
full of unknowns. AI agents must be able to detect the unknowns and
learn them in a self-supervised manner. They should not make
the closed-world assumption any more.
Open world learning (OWL) (a.k.a. open world recognition or classification, or open-world AI) is getting increasingly important as the learning agent is increasingly working in or facing the real-world open and dynamic environment, e.g., chatbot and self-driving car, where the agent cannot assume or expect
what it will see in the real-world contains only what it has learned previously. For example, a chatbot cannot assume that it knows everything that a user may say. A self-driving car cannot assume that the real-world has only things that it has seen and learned before. The core of open-world learning or open-world AI is about recognizing unknowns and learning them so that the AI agent will become more and more knowledgeable.
Classic machine learning makes the closed world assumption, i.e., the classes that the agent sees in training are what it will see in testing (no new objects or classes can appear in testing) (Fei and Liu 2016). A more realistic scenario is to expect unseen classes during testing (open world). In this case, the goal is to design a learning algorithm that can classify data of the known/seen classes into their respective classes and also to reject/detect instances from unknown/unseen classes. This problem is called open-world learning (or open-world classification). Apart from detecting the unseen classes, open-world learning should also incrementally or continually learn the new classes.
Tasks of open-world learning (OWL)
Open-world learning in dialogue systems: We have been working on this topic for the past two years because in dialogues unknown or new things happen all the time. See our 2020 and 2021 papers below.
- Task 1: learn a classifier that can perform classification of test instances that belong to training/seen
classes used in learning, and detect/reject instances that do not belong to
the training/seen classes - (the DOC algorithm (EMNLP-2017)
is quite powerful for this task for both text and images).
- Note - Rejection here is not the same as the traditional outlier/anormaly detection. Tradtional outlier detection typically detects outliers
from a given dataset based on some unsupervised learning methods, while
rejection in open-world learning detects unseen class instances ("outliers") on the
fly in testing using the classifier trained with only seen class examples (supervised learning),
and the classifier also does classification of test instances from the
- Task 2: ask the user to provide the unseen classes in the
rejected instances or automatically discover the unseen classes based
on the knowledge learned in the past (Shu et al. 2018).
- Task 3: incrementally learn the new/unseen classes
(Fei et al. 2016; Xu et al. 2019)
without retraining from scratch and without catastrophic forgetting.
In the process, the system learns and accumulates more and more knowledge (Fei et al. 2016).
The learner is self-motivated and it knows what it does and does not know. With intelligent systems such as chatbots and self-driving cars increasingly facing the real-world open (unknown) environments, we can no longer make the closed world assumption.
TextBook: Zhiyuan Chen and Bing Liu. Lifelong Machine Learning. Morgan & Claypool, 2018 (2nd edition), 2016 (1st edition).
- Bing Liu and Sahisnu Mazumder. Lifelong and Continual Learning Dialogue Systems: Learning during Conversation. to appear in Proceedings of AAAI-2021. 2021.
- Sahisnu Mazumder, Bing Liu, Shuai Wang, and Sepideh Esmaeilpour.
An Application-Independent Approach to Building Task-Oriented Chatbots with Interactive Continual Learning. to appear at NeurIPS-2020 Workshop on Human in the Loop Dialogue Systems (HLDS-2020). 2020.
- Sahisnu Mazumder, Bing Liu, Nianzu Ma, Shuai Wang. Continuous and Interactive Factual Knowledge Learning in Verification Dialogues. to appear at NeurIPS-2020 Workshop on Human And Machine in-the-Loop Evaluation and Learning Strategies (HAMLETS-2020). 2020.
- Bing Liu. Learning on the Job: Online Lifelong and Continual Learning. Proceedings of 34th AAAI Conference on Artifical Intelligence (AAAI-2020), Feb 7-12, 2020, New York City. (This work was done while I was on leave in Peking University).
- Hu Xu, Bing Liu, Lei Shu and P. Yu. Open-world Learning and Application to Product Classification. to appear in Proceedings of the Web Conference (formerly known as the WWW conference), San Francisco, May 13-17, 2019.
- Lei Shu, Hu Xu, Bing Liu. Unseen Class Discovery in Open-world Classification. arXiv:1801.05609 [cs.LG], 2018.
- Lei Shu, Hu Xu, Bing Liu. DOC: Deep Open Classification of Text Documents. Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-2017, oral presentation short paper), September 7–11, 2017, Copenhagen, Denmark.
- Geli Fei, Shuai Wang, and Bing Liu. 2016. Learning Cumulatively to Become More Knowledgeable. Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2016), August 13-17, San Francisco, USA.
- Geli Fei, and Bing Liu. 2016. Breaking the Closed World Assumption in Text Classification. Proceedings of NAACL-HLT 2016 , June 12-17, San Diego, USA.
Created on Jan 24, 2018 by Bing Liu.