Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps; Coifman, Lafon, Lee, Maggioni, Nadler, Warner, Zucker - manifold learning
We study the general question of how visual information is transformed between the lateral geniculate nu-cleus of the thalamus (LGN) and layer 4 of the primary visual cortex. LGN cells receive visual input from one eye and are not sensitive to an object's orientation or direction of movement. Cortical cells often receive binocular inputs and are usually orientation and direction selective. We use a number of techniques to explore how these transformations come about.
In our electrophysiological studies, we record the activity of many individual neurons simultaneously in both thalamus and cortex. In the cat, we are studying the cortical mechanisms responsible for the selectivity for orientation and direction of motion in simple cells. In the macaque, we concentrate on the first stages of color processing in the cortex. We have found that the wiring of the direct inputs to cortex is extremely precise. Given the visual properties of any single layer 4 cortical neuron, virtually all of the thalamic neurons that would help it perform this function are directly connected to it. In order to study the facilitatory interactions between these multiple inputs to cortical neurons, we are currently using multielectrode arrays to record up to ten neurons in the thalamus along with several of their potential targets.
In related projects we are using optical imaging, a technique for mapping the function of neural populations in vivo. These studies produce maps of the visual cortex that show the clustering of neurons with different receptive field properties. Functional maps allow us to target specific types of neurons (such as color-selective cells in the macaque) for electrophysiological study.
Object recognition requires that you know when two shapes are 'similar'. But what does similar mean? The mathematician says: make the set of all (two dimensional, three dimensional or higher) shapes into the points of an infinite-dimensional space and then put a metric on this space reflecting what 'similar' means. The background image is supposed to suggest this construction: here a certain set of eggs with varying shapes are each put in its own pigeon-hole. If, for example, our 'shapes' are taken to be open subsets of Euclidean space with smooth boundaries, then this space will be a Banach or Frechet manifold, but a highly non-linear one. The question of finding the right mathematical model for the space of such shapes is not unlike moduli problems and I tried to get a grip on this as soon as I looked at vision problems.
Around 2004 I met Peter Michor and found that he had systematically developed the foundations of differential geometry of such infinite dimensional spaces. This seemed to be the right tool for studying the above spaces of shapes. Since then, we have been studying various Riemannian metrics on them and their associated completions; the geodesics in these metrics and the curvature of the space; examples and applications to object recognition.
Abstract
The Earth Mover's Distance (EMD) between two weighted point sets (point distributions) is a distance measure commonly used in computer vision for color-based image retrieval and shape matching. It measures the minimum amount of work needed to transform one set into the other one by weight transportation. We study the following shape matching problem: Given two weighted point sets A and B in the plane, compute a rigid motion of A that minimizes its Earth Mover's Distance to B. No algorithm is known that computes an exact solution to this problem. We present simple FPTASs and polynomial-time (2+ε)-approximation algorithms for the minimum Euclidean EMD between A and B under translations and rigid motions. Earth Movers Distance