Visualization and Perception Lab

Prominent Moving Object Segmentation from Moving Camera Video Shots Using Iterative Energy Minimization

Chiranjoy Chattopadhyay and Sukhendu Das
Visualization and Perception Lab
Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India

Accepted with minor revision in Signal Image and VIdeo Processing (SIViP), Springer

Abstract

Extraction of the moving foreground object from a given video shot is an important task for spatio-temporal analysis and content representation in many computer vision and digital video processing applications. We propose an iterative framework based on energy minimization, for segmenting the prominent moving foreground object efficiently from moving camera video (MCV) shots. The solution obtained using graph-cut for figure-ground classification, is enhanced using features extracted over a set of neighboring frames. This is used to iteratively update the foreground and background probability (tri-) maps and hence the graph weights. The segmentation results from neighboring frames are integrated as constraints to iteratively guide the energy minimization process, for an efficient solution. The proposed framework is automatic and does not require any human interaction (neither initialization nor refinement). Our method outperforms recent state-of-the-art moving object segmentation techniques on benchmark datasets with MCV shots.

Qualitative Segmentation Results (Videos). Clck the links below to view and download the videos (in a new page)

Set1 (Bike, Boat)	Set4 (Tennis, Girl)
Set2 (Cheetah, People)(Multiple moving Objects)	Set5 (Soldier, Frog)(SegTrack V2)
Set3 (Birdfall, Parachute)	Set6 (Car, Aeroplane)
Set7 (Train)

Segmented foreground object is demarcated with an unique color. In case of selected videos, background pixels are overlayed with a suitable color, while the foreground pixels were kept unaltered to highlight the segmentation results.

Quantitative evaluation of segmentation performance with other methods using average F-measure.
	Proposed	Elqursh et al. (ECCV 2012)	Kwak et al. (ICCV 2011)	Sheikh et al. (ICCV 2009)	Yong et al. (ICCV 2011)	Papazoglou et al. (ICCV 2013)
Cars1	0.95	0.91	0.88	0.77	0.7	0.94
People 1	0.94	0.89	0.9	0.7	0.75	0.94
People 2	0.88	0.77	0.87	0.78	0.59	0.85
Tennis	0.92	0.89	-	0.41	0.72	0.9

Comparison of segmentation error using average number of erroneously labeled pixels per frame as a metric
	Proposed	Papazoglou et al. (ICCV 2013)	Yong et al. (ICCV 2011)	Zhang et al. (CVPR 2013)	Ma et al. (CVPR 2012)	Tsai et al. (IJCV 2012)
Birdfall	152	217	288	155	189	252
Girl	1300	3859	1785	1488	1698	1304
Parachute	195	855	201	220	221	235
Cheetah	625	890	905	633	806	1142
Monkey	265	284	521	365	472	563