copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Distributed Representations of Sentences and Documents

Q. Le, and T. Mikolov. Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, page 1188--1196. Bejing, China, PMLR, (June 2014)

Abstract

Many machine learning algorithms require the input to be represented as a fixed length feature vector. When it comes to texts, one of the most common representations is bag-of-words. Despite their popularity, bag-of-words models have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents. This algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that our technique outperforms bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Links and resources

BibTeX key: le2014distributed
entry type: inproceedings
address: Bejing, China
booktitle: Proceedings of the 31st International Conference on Machine Learning
year: 2014
month: jun
number: 2
pages: 1188--1196
publisher: PMLR
series: Proceedings of Machine Learning Research
volume: 32
pdf: http://proceedings.mlr.press/v32/le14.pdf
Document: https://proceedings.mlr.press/v32/le14.html

@andolab's tags highlighted

Cite this publication

search on

Meta data

Last update a year ago
Created a year ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Distributed Representations of Sentences and Documents

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Distributed Representations of Sentences and Documents

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Distributed Representations of Sentences and Documents

Comments and Reviews
(0)