Shot Boundaries Detection

Jpetiot/ janvier 7, 2009/ Applications

Context

Nearly all methods of audio or video segmentation perform with a priori knowledge. These approaches are based on a spatial-temporal modelling of the content and use decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections highly structured, such as broadcast videos of news and sports programmes, and homogeneous in terms of production can benefit from such methods. Furthermore, model or decision rules based methods are limited because, for each new collection, they need either a new learning or a new expertise and often new tools have to be defined.

Contrary to these methods, ours is a priori knowledge free and can easily be exploited by the upper layers.  It seems robust and is really generic as it doesn’t need any knowledge in order to identify correctly true shot boundaries from the following: fast object or camera motion, fast illumination changes, reflections, sudden change due to explosion and flash photography. Results, obtained on the corpora of the Argos campaign and the Trecvid campaign, are among the best.

Overview

Our goal is not to build a system that gives better results than those which exist for several years, use many kinds of features and operate separately on every special effect with many a priori knowledge.

We build a basic system using only the RGB system colors as follows:

  • divide each image into 4 equivalent parts;
  • in each image compute the RGB color means;
  • concatenate them in a vector of dimension 12 (one vector/image);
  • apply the generic segmentation: points of change are well detected but some false alarms appear;
  • delete the false boundaries by histograms comparison (with city-Block distance).

Evaluation

Tests were carried out on the ARGOS campaign corpus (20h). To evaluate our method, we used two metrics: the first one is the metric introduced in the evaluation campaign ARGOS for temporal segmentation tools, the second one is the TREC metric. Results are shown on the following table.

 All  CorporaINA corpus(news) SFRS corpus(documentaries)
ARGOS Recall0.9350.9390.931
Precision0.9310.9340.928
F-mesure0.9330.9360.929
TRECRecall0.8930.8970.883
Precision0.9180.9410.874
F-mesure0.9050.9180.878

Contributors

Share this Post