V2 and beyond?

Patrik Hoyer, Aapo Hyvärinen, and Michael Gutmann

Previous studies (see our other projects) have shown how many basic properties of the primary visual cortex, such as the receptive fields of simple and complex cells and the spatial organization (topography) of the cells, can be understood as efficient coding of natural images. In this project we extend the framework by considering how the responses of complex cells (when fed with natural image input) could be sparsely represented by a higher-order neural layer.

We model complex-cell activations by a simple energy model consisting of quadrature Gabor filters, squaring, and summing (this part of our model was fixed, not learned). This is depicted in the figure below:

We first sampled patches from a set of images of natural scenes. Then, we calculated our model complex-cell responses to these patches. For simplicity of interpretation and for computational reasons, we restricted our analysis to a single spatial scale, and the cells were placed on a rectangular 6-by-6 grid with 4 differently oriented cells at each location. Three patches and their corresponding responses are shown below. (The ellipses show the orientation and approximate extent of the individual complex cells. The brightness of the different ellipses indicate the response strengths.)

Now, we model our data (complex-cell activations) by the linear sparse coding (ICA) model. In other words, we seek a representation:

such that the stochastic coefficients are sparse and independent. As our input data is non-negative, we require the same of our model (both basis vectors and coefficients). This is a modification of standard ICA.

A small subset (16/288) of the estimated basis patterns are shown below:

Each basis pattern corresponds to a higher-order unit that represents that particular kind of input. As can be readily seen, our higher-order units code for collinear active complex cells, essentially signalling the presence of part of a contour. The units have varying preferred lengths, with some coding the activation of only a single complex cell whereas others represent longer contours.

As an important part of the representation is a competition between units, the varying length preference leads to length-tuning: short units are end-stopped, whereas units coding longer contours (collator units) do not respond at all to short contours. This is shown in our paper (Hoyer and Hyvärinen, 2002). In addition, the paper also describes how contour integration could be interpreted as top-down feedback in the presented model.

In our recent paper (Hyvärinen et al, 2005) we show that when the complex cell set is expanded to include cells of different preferred frequencies, many cells pool responses over different frequencies. The pooling is coherent in the sense that the complex cells that are pooled together have similar locations and orientations, forming something like a sharp wide-band edge.

These results pave the way for a theory-driven neuroscience, where experimental hypotheses are obtained from computational and statistical theories.

References

P.O. Hoyer and A. Hyvärinen, "A multi-layer sparse coding network learns contour coding from natural images", Vision Research, vol. 42, no. 12, pp. 1593-1605. 2002.
A. Hyvärinen, M. Gutmann and P.O. Hoyer, "Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in V2", BMC Neuroscience, vol. 6, no. 12, 2005.

V2 and beyond?

Research

References

Links

HIIT

Otaniemi

Kumpula

Search form

Search form

V2 and beyond?

Research

References

Links

HIIT

Otaniemi

Kumpula