@bsmyth

The Author-Topic Model for Authors and Documents

, , , and . 20th Conference on Uncertainty in Artificial Intelligence, 21, Banff Park Lodge, Banff, Canada, (July 2004)

Abstract

We introduce the author-topic model, a gen- erative model for documents that extends La- tent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship informa- tion. Each author is associated with a multi- nomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple au- thors is modeled as a distribution over topics that is a mixture of the distributions associ- ated with the authors. We apply the model to a collection of 1,700 NIPS conference pa- pers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative mod- els for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each au- thor is associated with a distribution over words rather than a distribution over top- ics. We show topics recovered by the author- topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

Description

my barry smyth

Links and resources

Tags

community

  • @marie_brei
  • @schwemmlein
  • @jaeschke
  • @bsmyth
  • @schmitz
  • @albinzehe
  • @ldietz
  • @gregoryy
  • @dblp
  • @folke
  • @l.sz.
  • @josephausterwei
@bsmyth's tags highlighted