Image understanding aims for
analysing an input image into its constituent patterns.
Depending on the task, we could be interested on
decomposing the image into regions which are homogeneous
according to some criteria such as colour or texture. This problem is called image
segmentation. The low-level features, like brightness,
colour, texture and/or motion attributes, should be used
to build sequentially a hierarchy of segmentations
(Shi
et al, 1997). A grouping method should have the
following properties (Felzenszwalb et al, 1998): i) to
capture perceptually important groupings or regions,
which reflect global aspects of the image, ii) to be
highly efficient, running in time linear in the number
of pixels, and iii) to create hierarchical segmentations
(Shi et al, 1997). In general, low-level cue image
segmentation cannot produce a complete final 'good'
segmentation (Borra et al, 1997).
A pyramid representation consists of a stack of layers,
each layer involving a number of processors (nodes).
Each processor has information about one part of the
image, called its receptive field. The bottom layer (layer
0) corresponds to the level of elements on the sensor (e.g.
number of pixels) . The number of processors in the
second layer (layer 1) is smaller by a factor of
λ - this factor is called a
reduction ratio, and it is greater than one.
Specifically, one processor (a 'parent') receives input
from its 'children' in the layer below. Processors on
higher layers receive only some information about the
content of their receptive fields. Pyramids are
hierarchical structures which have been widely used to
address the image segmentation task. In the pyramid hierarchy, each level is built by
computing a set of local operations over the level below.
These local operations adapt the hierarchy to the
topology of the image, allowing interesting features
such as the detection of global features of interest at
higher levels of the hierarchy and the reduction of the
complexity of the image segmentation task. If each level
of the pyramid is encoded as a graph, then the nodes of
any level over the base will be linked to the nodes of
the level above. That is, the pyramid provides multiple
representations with different levels of details (i.e.
resolution) for each region of the segmented image. This
property can be very interesting for certain tasks:
global relationships of the image can be analysed at
higher levels of the hierarchy where the whole image is
encoded using a reduced set of nodes; but detailed
versions of the regions associated to these nodes are
simultaneously available at low levels.
The efficiency of the pyramid in image segmentation is
strongly influenced by two characteristics: the data
structure used to encode every level of the pyramid
(horizontal relations) and the decimation scheme
employed to build one level from the level below
(vertical relations) (Brun and Kropatsch, 2003). The
data structure determines the information which may be
encoded at each level of the pyramid (e.g., simple
graphs, dual graphs, combinatorial maps...). Thus, it
roughly defines the horizontal relations of the entities
within the same level of the pyramid. On the contrary,
the decimation scheme employed to build the pyramid sets
the dynamic properties of the pyramid (height,
preservation of details, etc.). It corresponds to the
vertical relations between the entities between two
subsequent levels of the pyramid. Taking into account
these two characteristics, pyramids can be classified as
regular or irregular ones. Regular pyramids have a rigid
structure where the relationships among nodes of the
same level are fixed and the reduction ratio between the
number of nodes of two consecutive levels are constant
and bigger than one. In these hierarchies, the pyramid
structure can be adapted to the image topology by means
of the relationships among nodes of consecutive levels.
As it was pointed out by Bister et al (1990) or
Antonisse (1982), this is typically inadequate to
preserve the topology of the input image at higher
levels of the hierarchy. On the other hand, irregular
pyramids can adapt their spatial relationships (among
nodes of the same level and among nodes of consecutive
levels) to the content of the input image. Besides,
although the original proposals presented a serious
drawback with respect to computational efficiency, this
problem has been resolved by the last proposed
strategies (Brun and Kropatsch, 2003, Kropatsch, 2005).
H.J. Antonisse.
Image segmentation in pyramids. Comput. Graphics Image
Process. 19: 367–383, 1982.
M. Bister, J.
Cornelis, and A. Rosenfeld. A critical view of pyramid
segmentation algorithms. Pattern Recognition Letters 11:
605–617, 1990.
S. Borra and S.
Sarkar. A framework for performance characterization of
intermediate-level grouping modules. Pattern Recognition
and Image Analysis 19(11): 1306–1312, 1997.
L. Brun and W.G.
Kropatsch. Construction of combinatorial pyramids. In M.
Vento, E. Hancock (Ed.), GBRPR2003, LNCS 2726, 1–12,
Springer, 2003.
P.F.
Felzenszwalb and D.P. Huttenlocher. Image Segmentation
Using Local Variation. In IEEE Conf. Computer Vision
Pattern Recognition, 98–104, 1998.
W. Kropatsch,
Y. Haxhimusa, Z. Pizlo, and G. Langs. Vision pyramids
that do not grow too high. Pattern Recognition Letters
26(3): 319–337, 2005.
J. Shi and J.
Malik. Normalized Cuts and Image Segmentation. In IEEE
Conf. Computer Vision and Pattern Recognition, 731–737,
1997.
|