Article,

Named Entity Recognition Using Web Document Corpus

W. Karaa.
International Journal of Managing Information Technology (IJMIT), 3 (1): 46 to 55 (February 2011)

Abstract

This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE) can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE’s nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President Öbama". Likewise, a word preceded by the string "footballer" induces that this is the name of a footballer. NE recognition may be viewed as a classification method, where every word is assigned to a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning approach using training corpus: web documents, constructed from learning examples is then suggested. Frequency representations and modified tf-idf representations are used to calculate the context weights associated to context frequency, learning example frequency, and document frequency in the corpus.

BibTeX key: noauthororeditor
entry type: article
year: 2011
month: February
journal: International Journal of Managing Information Technology (IJMIT)
number: 1
pages: 46 to 55
volume: 3
language: English
Document: http://airccse.org/journal/ijmit/papers/3111ijmit04.pdf

BibSonomy

Named Entity Recognition Using Web Document Corpus

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on