There are mainly two psychological theories of
visual attention that have influenced the computation models existing today:
the feature integration theory and the guided search. The feature
integration theory proposed by Treisman and Gelade (Treisman and Gelade, 1980) suggests that
the human vision system detects separable features in parallel in a early
step of the attention process. Then they are spatially combined to
finally attend individually to each relevant location. According to this
model, methods compute image features in a number of parallel channels in
a pre-attentive task-independent stage. The extracted features are
integrated into a single saliency map which codes the saliency of each
image pixel (Itti et al, 1998; Koch and
Ullman, 1985; Neri, 2004; Aziz, 2009). While this previous theory
is mainly based in a bottom-up component of attention, the guided search theory proposed by
Wolfe et al (Wolfe et al, 1989) is
centered in the fact that a top-down component in attention can increase
the speed of the process when identifying the presence of a target in a
scene. The model computes a set of features over the image and the
top-down component activates locations that might contain the features of
the searched target. These two approaches are not mutually exclusive, and
nowadays, some efforts in computational attention are being conducted to
develop models which combine a bottom-up preattentive stage with a
top-down attentive stage (Navalpakkam and
Itti, 2005). The idea is that, while the bottom-up step is
independent of the task, the top-down component tries to model the
influence of the current executed task in the process of attention. For
instance, Navalpakkam and Itti extended Itti's model (Itti et al, 1998) by building a multiscale
object representation in a long-term memory. The multiscale object
features stored in this memory determine the relevance of the scene
features depending on the current executed task.
The aforementioned computational models are
space-based methods which allocate the attention to a region of the scene
rather than to an object or proto-object. An alternative to space-based
methods was proposed by Sun and Fisher (Sun
and Fisher, 2003). They present a grouping-based saliency method
and a hierarchical selection of attention at different perceptual levels
(points, regions or objects). The problem of this model is that the
groups are manually drawn. Orabona et al (Orabona
et al, 2007) propose a model of visual attention based on the
concept of 'proto-objects' as units of visual information that can be
bound into a coherent and stable object. They compute these proto-objects
by employing the watershed transform to segment the input image using
edge and colour features in a preattentive stage. The saliency of each
proto-object is computed taking into account top-down information about
the object to search depending on the task. Yu et al (Yu et al, 2010) propose a model of
attention in which, first in a preattentive stage the scene is segmented
into 'proto-objects' in a bottom-up manner using Gestalt theories. After
that, in a top-down way, the saliency of the proto-objects is computed
taking into account the current task to accomplish by using models of
objects which are relevant to this task. These models are stored in a
long-term memory.
A.M. Treisman and G. Gelade. A
feature integration theory of attention. Cognitive Psychology 12(1):
97-136, 1980.
L. Itti, U. Koch and E. Niebur. A
model of saliency-based visual attention for rapid scene analysis. IEEE
Trans. on Pattern Analysis and Machine Intelligence 20: 1254-1259, 1998.
C. Koch and S. Ullman. Shifts
selective visual attention: Towards the underlying neural circuitry.
Human Neurobiology 4: 219-227, 1985.
P. Neri. Attentional effects on
sensory tuning for single-feature detection and double-feature
conjunction. Vision Research, 3053-3064, 2004.
M.Z. Aziz. Behavior adaptive and real-time
model of integrated bottom-up and top-down visual attention. PhD. Thesis,
Fakultt fr Elektrotechnik, Informatik und Mathematik, Universitt
Paderborn, 2009.
J.M. Wolfe, K.R. Cave and S.L.
Franzel. Guided Search: An alternative to the feature integration model
for visual search. Journal for Experimental Psychology: Human Perception
and Performance, 15: 419-433, 1989.
V. Navalpakkam and L. Itti.
Modeling the influence of task on attention. Vision Research 45(2):
205-231, 2005.
Y. Sun and R.B. Fisher.
Object-based visual attention for computer vision. Artificial
Intelligence 146 (1): 77-123, 2003.
F. Orabona, G. Metta and G.
Sandini. A proto-object based visual attention model. In L. Paletta and
E. Rome (eds.) WAPCV 2007, LNAI 4840: 198-215, Springer, Heidelberg 2007.
Y. Yu, G.K.I. Mann and R.G.
Gosine. An object-based visual attention model for robotic applications.
IEEE Trans. on Systems, Man and Cybernetics 40(3): 1-15, 2010.
|