Jpetiot/ janvier 8, 2011/ Previous


The challenge REPERE is part of the objectives of the Content and Interaction Program of the Agence Nationale de la Recherche (ANR), in partnership with the Direction Générale de l’Armement (DGA). REPERE aims at evaluate people  recognition within  television programs. An evaluation is organized annually with a test in January. Three consortia are funded for a period of 36 months. Their goal is to achieve a system for recognizing people in audiovisual programs.

The different sources of information are:

  • the image in which people are visible,
  • text in which the names of people appear,
  • soundtrack in which the voices of speakers are recognizable
  • the content of the speech signal in which the names of people are pronounced.

The campaign has multiple objectives within the development of a multimodal system for the recognition of people:

  • study the impact of each modality on system performance,
  • develop strategies for multimodal fusion,
  • create a methodology for data annotation,
  • create metrics for measuring system performance.

The consortium

Three consortia are composed of French, Swiss, German public and private research organisations.ʉ۬

Consortium SODA

Laboratoire d’Informatique de l’Université du Maine (LIUM)  – Idiap Research Institute 

Consortium QCOMPERE

Laboratoire d’Informatique pour la Mécanique et les sciences de l’Ingénieur (LIMSI) – Centre de Recherche INRIA Grenoble Rhône-Alpes – Laboratoire d’Informatique de Grenoble – YACAST – Vocapia Research – Groupe de Recherche en informatique, image, automatique et instrumentation de Caen (GREYC) – Karlsruhe Institute of Technology (KIT)

Consortium PERCOL

Laboratoire d’Informatique Fondamentale de Marseille (LIF) – Université d’Avignon et des Pays de Vaucluse (UAPV) – Laboratoire d’Informatique Fondamentale de Lille (LIFL) – France Télécom


Laboratoire National de Métrologie et d’Essais (LNE) – IRIT (subcontractor of LNE)


During this campaign the following tasks are evaluated.

primary task which aims to address both issues:

  • Who is speaking?
  • Who do we see?

Multimodal basic tasks (multiple sources of information) and single mode (only one source of information available):

  • Who is speaking?
  • Who do we see?
  • Who are the people whose name is pronounced?
  • Who are the people whose name appears in the image?


  • detection and segmentation of heads,
  • transcription of speech,
  • audio diarization,
  • detection and segmentation of inlaid words.

At the end of the annual assessment which takes place in January, a final workshop bringing together all participants takes place on two days where a presentation of the results is performed. A discussion ensues about areas for improvement.

People involved in SAMOVA team

SAMOVA team is involved in the evaluation of head segmentation, detection  and tracking and inlaid words detection. 

  • Philippe Joly
  • Christine Sénac


  • Start time: June 2011
  • End time: May 2014

Web page

Share this Post