@seandalai

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

, , , , and . Information Retrieval, 4 (2): 115--132 (2001)

Abstract

In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.

Links and resources

Tags

community

  • @dblp
  • @seandalai
@seandalai's tags highlighted