Abstract

We define the relevant information in a signal x 2 X as being the information that this signal provides about another signal y 2 Y . Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize the problem as that of finding a short code for X that preserves the maximum information about Y . That is, we squeeze the information that X provides about Y through a `bottleneck ' formed by a limited set of codewords ~ X. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x; ~ x) emerges from the joint statistics of X and Y . The approach yields an exact set of self-consistent equations for the coding rules X ! ~ X and ~ X ! Y . Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

Description

The Information Bottleneck Method

Links and resources

Tags

community

  • @mo_xime
  • @mhwombat
  • @schmitz
  • @nosebrain
  • @r.b.
@r.b.'s tags highlighted