up

Irregular pyramids for image segmentation

Image understanding aims for analysing an input image into its constituent patterns. Depending on the task, we could be interested on decomposing the image into regions which are homogeneous according to some criteria such as colour or texture. This problem is called image segmentation. The low-level features, like brightness, colour, texture and/or motion attributes, should be used to build sequentially a hierarchy of segmentations (Shi et al, 1997). A grouping method should have the following properties (Felzenszwalb et al, 1998): i) to capture perceptually important groupings or regions, which reflect global aspects of the image, ii) to be highly efficient, running in time linear in the number of pixels, and iii) to create hierarchical segmentations (Shi et al, 1997). In general, low-level cue image segmentation cannot produce a complete final 'good' segmentation (Borra et al, 1997).

A pyramid representation consists of a stack of layers, each layer involving a number of processors (nodes). Each processor has information about one part of the image, called its receptive field. The bottom layer (layer 0) corresponds to the level of elements on the sensor (e.g. number of pixels) . The number of processors in the second layer (layer 1) is smaller by a factor of λ - this factor is called a reduction ratio, and it is greater than one. Specifically, one processor (a 'parent') receives input from its 'children' in the layer below. Processors on higher layers receive only some information about the content of their receptive fields. Pyramids are hierarchical structures which have been widely used to address the image segmentation task. In the pyramid hierarchy, each level is built by computing a set of local operations over the level below. These local operations adapt the hierarchy to the topology of the image, allowing interesting features such as the detection of global features of interest at higher levels of the hierarchy and the reduction of the complexity of the image segmentation task. If each level of the pyramid is encoded as a graph, then the nodes of any level over the base will be linked to the nodes of the level above. That is, the pyramid provides multiple representations with different levels of details (i.e. resolution) for each region of the segmented image. This property can be very interesting for certain tasks: global relationships of the image can be analysed at higher levels of the hierarchy where the whole image is encoded using a reduced set of nodes; but detailed versions of the regions associated to these nodes are simultaneously available at low levels.

The efficiency of the pyramid in image segmentation is strongly influenced by two characteristics: the data structure used to encode every level of the pyramid (horizontal relations) and the decimation scheme employed to build one level from the level below (vertical relations) (Brun and Kropatsch, 2003). The data structure determines the information which may be encoded at each level of the pyramid (e.g., simple graphs, dual graphs, combinatorial maps...). Thus, it roughly defines the horizontal relations of the entities within the same level of the pyramid. On the contrary, the decimation scheme employed to build the pyramid sets the dynamic properties of the pyramid (height, preservation of details, etc.). It corresponds to the vertical relations between the entities between two subsequent levels of the pyramid. Taking into account these two characteristics, pyramids can be classified as regular or irregular ones. Regular pyramids have a rigid structure where the relationships among nodes of the same level are fixed and the reduction ratio between the number of nodes of two consecutive levels are constant and bigger than one. In these hierarchies, the pyramid structure can be adapted to the image topology by means of the relationships among nodes of consecutive levels. As it was pointed out by Bister et al (1990) or Antonisse (1982), this is typically inadequate to preserve the topology of the input image at higher levels of the hierarchy. On the other hand, irregular pyramids can adapt their spatial relationships (among nodes of the same level and among nodes of consecutive levels) to the content of the input image. Besides, although the original proposals presented a serious drawback with respect to computational efficiency, this problem has been resolved by the last proposed strategies (Brun and Kropatsch, 2003, Kropatsch, 2005).

H.J. Antonisse. Image segmentation in pyramids. Comput. Graphics Image Process. 19: 367–383, 1982.

M. Bister, J. Cornelis, and A. Rosenfeld. A critical view of pyramid segmentation algorithms. Pattern Recognition Letters 11: 605–617, 1990.

S. Borra and S. Sarkar. A framework for performance characterization of intermediate-level grouping modules. Pattern Recognition and Image Analysis 19(11): 1306–1312, 1997.

L. Brun and W.G. Kropatsch. Construction of combinatorial pyramids. In M. Vento, E. Hancock (Ed.), GBRPR2003, LNCS 2726, 1–12, Springer, 2003.

P.F. Felzenszwalb and D.P. Huttenlocher. Image Segmentation Using Local Variation. In IEEE Conf. Computer Vision Pattern Recognition, 98–104, 1998.

W. Kropatsch, Y. Haxhimusa, Z. Pizlo, and G. Langs. Vision pyramids that do not grow too high. Pattern Recognition Letters 26(3): 319–337, 2005.

J. Shi and J. Malik. Normalized Cuts and Image Segmentation. In IEEE Conf. Computer Vision and Pattern Recognition, 731–737, 1997.

 

Last modification: 19.11.2010