Acción Integrada Hispano-Austriaca AT2009-0026 2009-2010

Research lines

Models of artificial visual attention

There are mainly two psychological theories of visual attention that have influenced the computation models existing today: the feature integration theory and the guided search. The feature integration theory proposed by Treisman and Gelade (Treisman and Gelade, 1980) suggests that the human vision system detects separable features in parallel in a early step of the attention process. Then they are spatially combined to finally attend individually to each relevant location. According to this model, methods compute image features in a number of parallel channels in a pre-attentive task-independent stage. The extracted features are integrated into a single saliency map which codes the saliency of each image pixel (Itti et al, 1998; Koch and Ullman, 1985; Neri, 2004; Aziz, 2009). While this previous theory is mainly based in a bottom-up component of attention, the guided search theory proposed by Wolfe et al (Wolfe et al, 1989) is centered in the fact that a top-down component in attention can increase the speed of the process when identifying the presence of a target in a scene. The model computes a set of features over the image and the top-down component activates locations that might contain the features of the searched target. These two approaches are not mutually exclusive, and nowadays, some efforts in computational attention are being conducted to develop models which combine a bottom-up preattentive stage with a top-down attentive stage (Navalpakkam and Itti, 2005). The idea is that, while the bottom-up step is independent of the task, the top-down component tries to model the influence of the current executed task in the process of attention. For instance, Navalpakkam and Itti extended Itti's model (Itti et al, 1998) by building a multiscale object representation in a long-term memory. The multiscale object features stored in this memory determine the relevance of the scene features depending on the current executed task.

The aforementioned computational models are space-based methods which allocate the attention to a region of the scene rather than to an object or proto-object. An alternative to space-based methods was proposed by Sun and Fisher (Sun and Fisher, 2003). They present a grouping-based saliency method and a hierarchical selection of attention at different perceptual levels (points, regions or objects). The problem of this model is that the groups are manually drawn. Orabona et al (Orabona et al, 2007) propose a model of visual attention based on the concept of 'proto-objects' as units of visual information that can be bound into a coherent and stable object. They compute these proto-objects by employing the watershed transform to segment the input image using edge and colour features in a preattentive stage. The saliency of each proto-object is computed taking into account top-down information about the object to search depending on the task. Yu et al (Yu et al, 2010) propose a model of attention in which, first in a preattentive stage the scene is segmented into 'proto-objects' in a bottom-up manner using Gestalt theories. After that, in a top-down way, the saliency of the proto-objects is computed taking into account the current task to accomplish by using models of objects which are relevant to this task. These models are stored in a long-term memory.

A.M. Treisman and G. Gelade. A feature integration theory of attention. Cognitive Psychology 12(1): 97-136, 1980.

L. Itti, U. Koch and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 20: 1254-1259, 1998.

C. Koch and S. Ullman. Shifts selective visual attention: Towards the underlying neural circuitry. Human Neurobiology 4: 219-227, 1985.

P. Neri. Attentional effects on sensory tuning for single-feature detection and double-feature conjunction. Vision Research, 3053-3064, 2004.

M.Z. Aziz. Behavior adaptive and real-time model of integrated bottom-up and top-down visual attention. PhD. Thesis, Fakultt fr Elektrotechnik, Informatik und Mathematik, Universitt Paderborn, 2009.

J.M. Wolfe, K.R. Cave and S.L. Franzel. Guided Search: An alternative to the feature integration model for visual search. Journal for Experimental Psychology: Human Perception and Performance, 15: 419-433, 1989.

V. Navalpakkam and L. Itti. Modeling the influence of task on attention. Vision Research 45(2): 205-231, 2005.

Y. Sun and R.B. Fisher. Object-based visual attention for computer vision. Artificial Intelligence 146 (1): 77-123, 2003.

F. Orabona, G. Metta and G. Sandini. A proto-object based visual attention model. In L. Paletta and E. Rome (eds.) WAPCV 2007, LNAI 4840: 198-215, Springer, Heidelberg 2007.

Y. Yu, G.K.I. Mann and R.G. Gosine. An object-based visual attention model for robotic applications. IEEE Trans. on Systems, Man and Cybernetics 40(3): 1-15, 2010.

Last modification: 30.05.2011