The IV group (Inférence Visuelle - Visual inference), is working on the analysis of visual objects, like computer images, video, or 3D models. Our work follows two main streams, computer vision and 3D modeling, and user centered multimedia applications.
Computer vision and 3D modeling. The IV group addresses these topics both theoretically and practically. We achieved several new results using both geometric and photometric approaches, at all levels of the processing pipeline. At the earliest stage, 2D features (e.g. points, regions, superpixels) must be detected in input images. We investigated detection of ellipse arcs and polygonal chains using an a contrario framework through a pipeline including three steps : candidate selection, candidate validation, and model selection. We also worked on image segmentation by fusing photometric and geometric criteria, introducing original approaches for fine structure detection in road images and urban scene segmentation.
At a mid-level stage, the aim of feature matching is to provide stereo correspondences. Regarding dense matching based on correlation, our work is mainly based on photometric aspects by combining a proposed measure robust against occlusions and a classical correlation that is efficient in non-occluded textured areas [15352,12177].
At an end stage is the task of 3D reconstruction, which is the “inverse problem” of recovering the shapes of objects in a scene and/or the pose of the camera. Regarding single-view reconstruction, we investigated uncalibrated photometric stereo, an active vision technique that recovers the 3D model of an object from several images taken from the same angle of view, but under diﬀerent lightings from unknown sources. We proposed new approaches and developed an efficient integration method to compute a 3D-shape from a normal field . Application-wise, we have designed a new 3D face capture system robust to outliers such as shadows or specular highlights.
Finally, using an analysis-by-synthesis approach, we proposed a new method to generate a 3D model of plants from a single image relying on a priori knowledge of the plant species. We investigated the so-called “multi-view geometry” constraints describing the pure geometrical relations between the matched 2D features within a collection of views, the cameras and the scene, as well as some projective 3D objects e.g., the absolute conic or pencils of confocal quadrics . By relating projective to Euclidean geometries, direct applications of this work fostered new extensions for the uncalibrated structure-from-motion pipeline. For more general 3D modeling of reconstructed objects, we proposed a framework for detection of local similarities in free-form parametric models (NURBS based B-reps), in which patches similar up to an approximated isometry are identified .
User centered multimedia Applications. A driving idea considers user interactions with MM content to interpret, adapt and simplify it. As an example, in , interactions with a zoomable video are used for identifying dynamic regions of interest. This information is used for retargeting the video on light devices, that is, adapting the video to the smaller screen size by changing viewpoint and zoom level. In , a client is downloading a 3D model for a server, and, while downloading, we propose a preview of the 3D object based on the partial 3D model. For that, we adapt the progressive stream delivered to the client to a predefined viewpoint path. In both cases, we improve the qualitative experience of users. Other work in MM include extracting semantics on images from a hide-and-seek game, simplify a rich 3D scene using previous users rendering, and link textual description to 3D views from user interactions to enhance 3D online navigation.
The IV group has integrated new competences and strengthened his competences in vision and interactive vision. Our goal is to combine our skills for tackling ambitious, multidisciplinary topics. More than ever, Augmented Reality (AR) can be seen as a common ground to develop future research mixing the different competences of the group, as they range from computer vision to 3D modeling and visual interactions. For example, modern mobile devices provide user-friendly interactive interfaces that can ease the resolution of computer vision problem. Similarly, multimedia and reconstruction applications involving analysis, generation, adaptation or distribution of rich media, is a field where the competences of the IV can be combined.
Augmented reality. Visual special effects is a multi-disciplinary application area that offers interesting open problems with both geometric and photometric issues such as 3D modeling, camera tracking and relighting, which are already active topics of our IV group. One particular aspect we want to investigate is the creation and the analysis of a visual knowledge base from images of the scene that can be used to track the camera movement in the scene by querying the knowledge base with the current frame : thus the problem of real-time camera tracking is shifted to that of real-time camera localization. Following our past experience in AR for special effects (ANR ROM with INRIA and DURAN-DUBOI), we are starting new collaboration about VFX previsualization using the knowledge base approach, in collaboration with the University of Oslo and a french industrial studio, Mikros Image.
Scene analysis based on images and 3D a-priori knowledge is a related research direction that has many interesting applications in Augmented Reality and mobile applications. A regional “Laperouse” project, starting in September 2014, aims at automatically detecting problems (such as vandalism, or equipment failures) in urban environments from pictures or videos taken by users. The challenges to be addressed range from user localization from urban images to object detection and recognition, in particular by developing 2D-3D matching algorithms that use both geometric (curvature, similarity, symmetry) and photometric aspects (color, texture).
Analysis by synthesis in Multimedia and Computer Vision. Another original research direction is to use our skills and experiences in user-centered multimedia applications to provide classical computer vision with prior knowledge following an analysis by synthesis approach. This path is a possible way to partially bridge the semantic gap with a human-in-the-loop approach. As a follow up to our work in crowdsourcing, a collaboration with NU Singapore conform to this approach has been proposed in the context of the UMI IPAL. We also plan to use the additional crowdsourced knowledge to enhance 2D to 3D correspondences ; from 3D to 2D, the regional project mentioned above indexes images or videos with predefined 3D objects in urban environments ; from 2D to 3D, taking into account additional hypotheses on the 3D model, for example from users interactions, can simplify a reconstruction process (a project with FittingBox and three other french academic partners is starting on this topic).
Another domain of application of the analysis by synthesis approach is the 3D reconstruction using photometric stereo, an active research topic of the group. In this context, image pixels located in the shadow of the object are usually discarded as they do not satisfy the Lambertian assumption. On the other hand, since shadows carry valuable information for the 3D-shape interpretation, we propose to exploit this information to solve the problem using an analysis/synthesis loop : the shadows simulated in the synthesis step are expected to coincide with the shadows present in the images.
Our goal is to group and unify our competences on these two challenging topics in the context of collaborative projects, in order to enhance both our scientific production and national and international visibility.
|Online galleries of 3D models typically provide two ways to preview a model before the model is downloaded and viewed by the user : (i) by showing a set of thumbnail images of the 3D model taken from representative views (or keyviews) ; (ii) by showing a video of the 3D model as viewed from a moving virtual camera along a path determined by the content provider. We propose a third approach called preview streaming for mesh-based 3D objects : by streaming and showing parts of the mesh surfaces visible along the virtual camera path.|
|An overview of the streaming process between client and server.|
|Transactions on Multimedia Computing, Communication and Applications, ACM, Vol. 10 N. 1, p. 13, 2014.|
|Solving inverse problems via uncalibrated approaches : Camera critical motions, 2D Euclidean structures, a contrario ellipse detection, SfS ambiguity, Photometry Stereo via Total Variation. These lead to a wide range of applications : augmented reality and relighting, special effects in filmmaking, visual markers, ...|
|Photometric Stereo from a Single Camera Pose|
|Journal of Mathematical Imaging and Vision, Springer-Verlag, 2014|
|Plants are essential elements of virtual worlds to get pleasant and realistic 3D environments. Even if mature computer vision techniques allow the reconstruction of challenging 3D objects from images, due to high complexity of plant topology, dedicated methods for generating 3D plant models must be devised. We propose an analysis-by-synthesis method which generates 3D models of a plant from both images and a priori knowledge of the plant species. Our method is based on a skeletonisation algorithm which allows to generate a possible skeleton from a foliage segmentation. Then, a 3D generative model, based on a parametric model of branching systems that takes into account botanical knowledge is built. This method extends previous works by constraining the resulting skeleton to follow hierarchical organisation of natural branching structure. A first instance of a 3D model is generated. A reprojection of this model is compared with the original image. Then, we show that selecting the model from multiple proposals for the main branching structure of the plant and for the foliage improves the quality of the generated 3D model. Varying parameter values of the generative model, we produce a series of candidate models. A criterion based on comparing 3D virtual plant reprojection with original image selects the best model.|
|On the left is the original tree picture. The other two pictures (on the right show) the computed model|
|International Symposium on Visual Computing, Greece, ACM, 2013|