Lexical semantics to infer discourse relations and domain knowledge
NL interfaces to instructional and educational applications need
knowledge about how to act in a domain, and communication knowledge
about how to talk about acting in a domain. Deriving such knowledge by
hand is time consuming, error prone, and ultimately the reason why NL
interfaces are often domain dependent. The work supported by the NSF
CAREER award seeks to make fundamental progress to overcome this
limitation by developing a methodology, algorithms and tools to
semi-automatically derive domain and communication knowledge from text
and dialogues.
This work entails three steps.
- We have automatically built a
corpus annotated with semantic information. To do so, we have coupled
the robust parser LCFLEX (part of the CARMEL
Workbench), with two lexical semantics lexica, VerbNet
for verbs and CoreLex
for nouns. Lexical semantics is a crucial
component of meaning that accounts for inferences engendered by action
verbs. In this work, we generalize that idea and exploit lexical
semantics to draw the acquisition of discourse relations and
domain knowledge.
- We have
developed a coding scheme to encode relations between actions.
This coding scheme is
being used to annotate the corpus that was automatically labeled with
semantic information. We are also exploring whether this annotation
can be done automatically by means of a discourse parser, i.e.,
we are exploring using inductive logic programming to infer discourse
relations between action descriptions.
- We
will develop a novel acquisition engine to acquire knowledge from the
corpus annotated with task structure and discourse / dialogue
structure. The acquisition engine will couple machine learning
techniques with an inference engine based on description logic. A
first-order rule learner will infer portions of schemes; the
description logic based system will compose those portions into
complete schemes. This last step will be semi-automatic: a human
reviewer will be called upon to confirm or correct additions to the
Knowledge Base.