Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
P. Turney. Machine Learning: ECML 2001, volume 2167 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 10.1007/3-540-44795-4_42.(2001)
Abstract
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
%0 Book Section
%1 turney2001mining
%A Turney, Peter
%B Machine Learning: ECML 2001
%D 2001
%E Raedt, Luc De
%E Flach, Peter
%I Springer Berlin / Heidelberg
%K 10_year_award 2001 2011 Mining award ecml_pkdd_2011 pmi-ir synonyms toefl versus web
%P 491-502
%T Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
%U http://dx.doi.org/10.1007/3-540-44795-4_42
%V 2167
%X This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
%@ 978-3-540-42536-6
@incollection{turney2001mining,
abstract = {This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).},
added-at = {2011-09-03T12:02:05.000+0200},
affiliation = {National Research Council of Canada Institute for Information Technology M-50 Montreal Road Ottawa Ontario Canada K1A 0R6},
author = {Turney, Peter},
biburl = {https://www.bibsonomy.org/bibtex/262b5ae40c43d87930274e31020b42cac/ecml_pkdd_2011},
booktitle = {Machine Learning: ECML 2001},
editor = {Raedt, Luc De and Flach, Peter},
interhash = {8e5ac4302379bb3e66512a3696669bcb},
intrahash = {62b5ae40c43d87930274e31020b42cac},
isbn = {978-3-540-42536-6},
keyword = {Computer Science},
keywords = {10_year_award 2001 2011 Mining award ecml_pkdd_2011 pmi-ir synonyms toefl versus web},
note = {10.1007/3-540-44795-4_42},
pages = {491-502},
publisher = {Springer Berlin / Heidelberg},
series = {Lecture Notes in Computer Science},
timestamp = {2011-09-03T12:02:05.000+0200},
title = {Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL},
url = {http://dx.doi.org/10.1007/3-540-44795-4_42},
volume = 2167,
year = 2001
}