Temporal Relation Matrix Adaptative User-Defined Similarity Measure Interaction and Speaker Role Recognition Multimodal Spatio-temporal Clustering
Temporal relation Matrix Adaptative User-Defined Similarity Measure Interaction and Speaker Role Recognition Multimodal Spatio-temporal Clustering
Audiovisual Signature
Audiovisual Signature

 

The main goal of our group on that topic is to define tools able to perform the structure analysis on the audio and the video tracks at the same time. To do so, two main approaches are currently explored:

  • The first one aims at highlighting the existence of temporal relationship between two types of event. An event has to be considered here as a "segment in which a given type of content can be observed" such as a given face, a given speaker, music, a graphical icon, etc. The temporal relation between two segments can be characterized with three numerical parameters. Considering this, each couple of segments produced by two segmentation processes, can be associated with one point in a 3D space. We so proceed to a vote for all the temporal relations which can be found between all the segments automatically identified. The vote distribution in the 3D matrix can then be used to identify different pieces of information such as:
    • The association between a voice and a face (corresponding to a same person),
    • The players belonging to a same team in a TV game,
    • The entertainer identification in a radio program,
    • Etc
  • The second one aims at establishing a style similarity measure between two recordings. The input of this method is any low-level feature associating series of numerical values to an audiovisual content along the temporal dimension. Then local best matching between these values extracted from two different contents are performed with an optimized algorithm. All the quality rates of each match are integrated in a 2D matrix. Here again, the distribution of the higher coefficients in this matrix allows the characterization of two types of similarity:
    • Diagonal distributions correspond to a strict similarity. This can be observed when some subparts of the compared documents are exactly the same.
    • Distributions in "blocks" correspond locally to a style similarity. The content in the corresponding segments is not exactly the same, but presents some common properties.

From the "similarity matrix" a global similarity measure can be then extracted and used for a document classification task for example.