Abstract
An important characteristic of English written text is the abundance of noun
compounds - sequences of nouns acting as a single noun, e.g., colon cancer
tumor suppressor protein. While eventually mastered by domain experts, their
interpretation poses a major challenge for automated analysis. Understanding
noun compounds' syntax and semantics is important for many natural language
applications, including question answering, machine translation, information
retrieval, and information extraction. I address the problem of noun compounds
syntax by means of novel, highly accurate unsupervised and lightly supervised
algorithms using the Web as a corpus and search engines as interfaces to that
corpus. Traditionally the Web has been viewed as a source of page hit counts,
used as an estimate for n-gram word frequencies. I extend this approach by
introducing novel surface features and paraphrases, which yield
state-of-the-art results for the task of noun compound bracketing. I also show
how these kinds of features can be applied to other structural ambiguity
problems, like prepositional phrase attachment and noun phrase coordination. I
address noun compound semantics by automatically generating paraphrasing verbs
and prepositions that make explicit the hidden semantic relations between the
nouns in a noun compound. I also demonstrate how these paraphrasing verbs can
be used to solve various relational similarity problems, and how paraphrasing
noun compounds can improve machine translation.
Users
Please
log in to take part in the discussion (add own reviews or comments).