Work in NLP has shifted from hand-built grammars
that need to cover explicitly every sentence
structure and that break down on unexpected
inputs to more robust statistical parsing.
However, grammars that involve precise semantics
are still largely hand-built (e.g. (Carpenter,
1998; Copestake and Flickinger, 2000)). We aim
at extending the robustness trend to the semantics.
We start with the compositional semantics
framework of (Blackburn and Bos, 2000; Bos,
2001) and modify it to achieve greater robustness
and coverage.2
One difference is that our lexicon is kept
very small and includes only a few words with
special semantic entries (like pronouns, connectives,
and numbers). Open-category words
come with their part-of-speech information in
the parse trees (e.g. (NN dog)), so their semantics
can be obtained using generic semantic templates
(but cf. §3.5).
In classic rule-to-rule systems of semantics
like (Blackburn and Bos, 2000), each syntactic
rule has a separate semantic combination rule,
and so the system completely fails on unseen
syntactic structures. The main distinguishing
goal of our approach is to develop a more robust
process that does not need to explicitly specify
how to cover every bit of every sentence. The
system incorporates a few initial ideas in this
direction.
First, role and argument-structure information
for verbs is expensive to obtain and unreliable
anyway in natural texts. So to deal with
verbs and VPs robustly, their semantics in our
system exports only an event variable rather
than variables for the subject, the direct object,
etc. VP modifiers (such as PPs and ADVPs)
combine to the VP by being applied on the exported
event variable. NP modifiers (including
the sentence subject) are combined to the event
variable through generic roles: subj, np1, np2,
etc. The resulting generic representations are
suitable in the puzzles domain because usually
only the relation between objects is important
and not their particular roles in the relation.
This is true for other tasks as well, including
some broad-coverage question answering.
All NPs are analyzed as generalized quantifiers,
but a robust compositional analysis for
the internal semantics of NPs remains a serious
challenge. For example, the NP “three rooms”
should be analyzed as Q(num(3), x, room(x), ..),
but the word “three” by itself does not contribute
the quantifier – compare with “at least
three rooms” Q(¸3, x, room(x), ..). Yet another
case is “the three rooms” (which presupposes
2Our system uses a reimplementation in Lisp rather
than their Prolog code.
a group g such that g ยต room ^ |g| = 3). The
system currently handles a number of NP structures
by scanning the NP left-to-right to identify
important elements. This may make it easier
than a strictly compositional analysis to extend
the coverage to additional cases.
All other cases are handled by a flexible combination
process. In case of a single child, its
semantics is copied to its parent. With more
children, all combinations of applying the semantics
of one child to its siblings are tried,
until an application does not raise a type error
(variables are typed to support type checking).
This makes it easier to extend the coverage
to new grammatical constructs, because usually
only the lexical entry needs to be specified, and
the combination process takes care to apply it
correctly in the parse tree.
No comments:
Post a Comment