The natural language community has struggled for years to develop computational models of text structure. Such models are critical both for interpretation and for generation of natural language texts. Unfortunately, most current systems rely on manually crafted representations of text structure.
In this talk, I will discuss how to automatically induce the content structure of a text. A content model captures topics a text addresses and the order in which these topics appear. I will present an effective method for learning content models from unannotated domain-specific documents, utilizing a novel adaptation of algorithms for Hidden Markov Models. Incorporation of these models in information ordering and summarization applications yields substantial improvement over previously proposed methods.
This is joint work with Lillian Lee of Cornell University.
Regina Barzilay is an assistant professor in the EECS department at MIT. She obtained her PhD from Columbia University in 2003, and she was a postdoc at Cornell University in 2003-2004. She has just been named one of 35 top young scientists in North America by MIT Technology Review magazine.
Peter C. Nelson, Ph.D.
Professor and Head
Department of Computer Science
University of Illinois at Chicago
312-413-2911 (my assistant Ms. Imelda Baker)