Computational vision
The new study A Feedforward Architecture Accounts for Rapid Categorization, Serre, T., A. Oliva and T. Poggio, PNAS 2007, in press [not online yet] reveals the success of a computational version of vision modeled on the visual cortex processes of immediate recognition of objects. The feedforward model is based on what our vision perceives in the first 100-200 milliseconds of exposure in the ventral stream before cognitive feedback loops kick in. It recognized objects in a database of street scenes with reasonable accuracy and uses a learning algorithm to become better at categorizing new objects. In this study, their system was trained by exposure to images then pitted against human vision and both performed nearly the same, with over 90% accuracy for close-ups and 74% for distant views.
Thomas Serre, Tomaso Poggio and others at the Center for Biological and Computational Learning in the McGovern Institute, the Department of Brain and Cognitive Sciences, and the Computer Science and Artificial Intelligence Lab at MIT collaborated on the system. Another new paper, Robust Object Recognition with Cortex-Like Mechanisms, Serre et al., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 29 No 3, March 2007 [free PDF], describes the development. The feedforward model uses four layers:
- Visual processing is hierarchical, aiming to build
invariance to position and scale first and then to
viewpoint and other transformations.- Along the hierarchy, the receptive fields of the neurons
(i.e., the part of the visual field that could potentially
elicit a response from the neuron) as well as the
complexity of their optimal stimuli (i.e., the set of
stimuli that elicit a response of the neuron) increases.- The initial processing of information is feedforward
(for immediate recognition tasks, i.e., when the image
presentation is rapid and there is no time for eye
movements or shifts of attention).- Plasticity and learning probably occurs at all stages
and certainly at the level of inferotemporal (IT)
cortex and prefrontal cortex (PFC), the top-most
layers of the hierarchy.
Poggio said, “We have not solved vision yet, but this model of immediate recognition may provide the skeleton of a theory of vision. The huge task in front of us is to incorporate into the model the effects of attention and top-down beliefs.”
Their next goal is research on the 200-300 milliseconds after the feedforward process of immediate recognition, and a larger one is to incorporate cognitive feedback loops. The feedforward model may ultimately be useful as a front end to more complex processing systems. Bigger implications:
This new study supports a long–held hypothesis that rapid categorization happens without any feedback from cognitive or other areas of the brain. The results also indicate that the model can help neuroscientists make predictions and drive new experiments to explore brain mechanisms involved in human visual perception, cognition, and behavior. Deciphering the relative contribution of feed-forward and feedback processing may eventually help explain neuropsychological disorders such as autism and schizophrenia. The model also bridges the gap between the world of artificial intelligence (AI) and neuroscience because it may lead to better artificial vision systems and augmented sensory prostheses.
Read more.
Download the open source software with StreetScenes dataset.
More research from the MIT CBCL lab.
x-posted to Omni Brain