Capturing video structure with mixture of probabilistic index maps

Perina, Alessandro; Cristani, Marco; Murino, Vittorio; Jojic, N.

The ability to segment or separate foreground from background in video images is useful to a number of applications including video compression, human-computer interaction, and object tracking to name a few. In order to generate such segmentation in both a reliable and visually pleasing manner the fusion of both spatial and temporal information is required. This fusion typically requires to process a large amount of information thereby imposing a heavy computational cost and/or requiring substantial manual interaction. This heavy computational cost unfortunately limits its applicability. In this paper a generative model to solve this problem is proposed. The model has been designed with a particular emphasis on efficiency, but also provide visually pleasing results. The approach selects salient appearance poses of the foreground shared across the entire sequence in an unsupervised way, and uses them to better extract the foreground from the single frames. Results prove the validity of the approach.