@lee_peck

Information-theoretic co-clustering

, , и . KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, стр. 89--98. New York, NY, USA, ACM, (2003)
DOI: http://doi.acm.org/10.1145/956750.956764

Аннотация

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

Описание

Information-theoretic co-clustering

Линки и ресурсы

тэги

сообщество

  • @becker
  • @cdevries
  • @rabeeh
  • @infospace
  • @msn
  • @hotho
  • @mmcgloho
  • @dblp
  • @r.b.
  • @folke
  • @lee_peck
  • @marymcglo
@lee_peck- тэги данного пользователя выделены