Program Boundaries Detection
Very few researches have been done for program boundaries detection on TV broadcast for now. All existing approaches are based on a spatiotemporal modelling of the content and decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections following the same structure can benefit from such methods. Furthermore, model or decision rules based methods are limited because, for each new collection, they need either a new learning or a new expertise and often new tools have to be defined. Contrary to these methods, we use our generic audio video segmentation approach (lien vers generic segmentation) relaying mainly on the hypothesis that segmenting into programs is equivalent to segment into homogeneous segments at the adequate scale. We evaluate how the hypothesis that basic video and audio features present homogeneous values during a program can be exploited by a GLR-BIC segmentation algorithm to identify programs in days of television contents. The homogeneity criterion is evaluated by the ability to describe this feature values with a Gaussian law. On the base of results obtained from rough audio and video features, we evaluate the improvement obtained by an early fusion of these features, and by a prior usage about the existence of monochromatic frames in commercials. In our work, the first step was then to check if typical video and audio features could validate the above hypothesizes. The second step was to identify possible limitations inherent to this approach and to propose solutions to overcome these problems.
Tests were carried out on 120 hours of TV videos recorded continuously from a general French TV channel during 5 days (including various kinds of programs such as news, weather forecast, talk-shows, sports and sitcoms) with a frame rate equal to 25 frames per second. To evaluate our method, we used a metric introduced in the evaluation campaign ARGOS for temporal segmentation tools. To each detected segment is associated the ground-truth segment that has the maximum temporal intersection with it. When the segmentation consists in a complete partition of a whole recording, recall and precision are associated to a same metric which is the ratio between the sum of intersections durations for all segments and the ground-truth segments duration (here: the whole recording duration = 120 h). This metric highlights the ability of the segmentation tool to gather units belonging to a same segment in opposition to the metric used for example in TrecVid to evaluate shot segmentation tools, which highlights the ability to localize transitions.
Visual system: only visual features were used.
Audio system: idem for audio features.
AV system: idem for audio and video features together;
Improved system: on French television, advertisements are separated by a short sequence of monochrome images (white, blue or black). As this kind of effect can be easily detected, false program boundaries can be removed if they are suspected to belong to an advertisement program.
Results of different programs boundaries detection versions on 120 continuous hours of test are shown in the following table.
- Elie El Khoury, Christine Senac, Philippe Joly. Unsupervised TV Program Boundaries Detection Based on Audiovisual Features. Dans : IEEE International Conference on Visual Information Engineering (VIE 2008), Xi’an China, 29/07/08-01/08/08, 2008.