Maxime Vono

Welcome

This website will be soon outdated. Please see my new webpage.

I am a Ph.D. student since October, 2017 in Toulouse (France) supervised by Nicolas Dobigeon and Pierre Chainais, within the SC group of the IRIT laboratory. I am also affiliated to the ORION-B project.

Prior to that, I graduated from Ecole Centrale de Lille majoring in data science. I also hold a M.Sc. degree in applied mathematics from University of Lille.

I work on Monte Carlo methods for statistical machine learning and signal processing. I am particularly interested in the connections between optimisation and simulation-based approaches.

News/Events:
September 2020 Our paper has been accepted for publication in the Journal of Computational and Graphical Statistics.
April 2020 Major revision of our paper on the theoretical analysis of the Split Gibbs Sampler.
2018-2019 Data science/analytics consultancy missions for a large retailer.

Research

Preprints

  1. Quantitative inference of the H2 column densities from 3 mm molecular emission: A case study towards Orion B
    P. Gratier et al.

    Molecular hydrogen being unobservable in cold molecular clouds, the column density measurements of molecular gas currently rely either on dust emission observation in the far-IR or on star counting. (Sub-)millimeter observations of numerous trace molecules are effective from ground based telescopes, but the relationships between the emission of one molecular line and the H2 column density (NH2) is non-linear and sensitive to excitation conditions, optical depths, abundance variations due to the underlying physico-chemistry. We aim to use multi-molecule line emission to infer NH2 from radio observations. We propose a data-driven approach to determine NH2 from radio molecular line observations. We use supervised machine learning methods (Random Forests) on wide-field hyperspectral IRAM-30m observations of the Orion B molecular cloud to train a predictor of NH2, using a limited set of molecular lines as input, and the Herschel-based dust-derived NH2 as ground truth output. For conditions similar to the Orion B molecular cloud, we obtain predictions of NH2 within a typical factor of 1.2 from the Herschel-based estimates. An analysis of the contributions of the different lines to the predictions show that the most important lines are 13CO(1-0), 12CO(1-0), C18O(1-0), and HCO+(1-0). A detailed analysis distinguishing between diffuse, translucent, filamentary, and dense core conditions show that the importance of these four lines depends on the regime, and that it is recommended to add the N2H+(1-0) and CH3OH(20-10) lines for the prediction of NH2 in dense core conditions. This article opens a promising avenue to directly infer important physical parameters from the molecular line emission in the millimeter domain. The next step will be to try to infer several parameters simultaneously (e.g., NH2 and far-UV illumination field) to further test the method. [Abridged]

            @article{Gratier_2020,
            author = {Pierre Gratier, Jérôme Pety, Emeric Bron, Antoine Roueff, Jan H. Orkisz, Maryvonne Gerin, Victor de Souza Magalhaes, Mathilde Gaudel, Maxime Vono, Sébastien Bardeau, Jocelyn Chanussot, Pierre Chainais, Javier R. Goicoechea, Viviana V. Guzmán, Annie Hughes, Jouni Kainulainen, David Languignon, Jacques Le Bourlot, Franck Le Petit, François Levrier, Harvey Liszt, Nicolas Peretto, Evelyne Roueff, Albrecht},
            year = {2020},
            title = {Quantitative inference of the H2 column densities from 3 mm molecular emission: A case study towards Orion B},
            journal = {arXiv preprint arXiv:2008.13417}
            volume = {},
            number = {},
            pages = {}
            }
          
  2. Efficient MCMC sampling with dimension-free convergence rate using ADMM-type splitting
    M. Vono*, D. Paulin*, and A. Doucet
    * equal contribution

    Performing exact Bayesian inference for complex models is computationally intractable. Markov chain Monte Carlo (MCMC) algorithms can provide reliable approximations of the posterior distribution but are expensive for large datasets and high-dimensional models. A standard approach to mitigate this complexity consists in using subsampling techniques or distributing the data across a cluster. However, these approaches are typically unreliable in high-dimensional scenarios. We focus here on a recent alternative class of MCMC schemes exploiting a splitting strategy akin to the one used by the celebrated ADMM optimization algorithm. These methods appear to provide empirically state-of-the-art performance but their theoretical behavior in high dimension is currently unknown. In this paper, we propose a detailed theoretical study of one of these algorithms known as the split Gibbs sampler. Under regularity conditions, we establish explicit convergence rates for this scheme using Ricci curvature and coupling ideas. We support our theory with numerical illustrations.

            @article{Vono_Paulin_Doucet_2019,
            author = {Vono, Maxime and Paulin, Daniel and Doucet, Arnaud},
            year = {2019},
            title = {Efficient MCMC sampling with dimension-free convergence rate using ADMM-type splitting},
            journal = {arXiv preprint arXiv:1905.11937}
            volume = {},
            number = {},
            pages = {}
            }
          

Journal papers

  1. Asymptotically exact data augmentation: models, properties and algorithms
    M. Vono, N. Dobigeon, and P. Chainais
    Journal of Computational and Graphical Statistics (in press), 2020

    Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo ones. Nonetheless, introducing appropriate auxiliary variables while preserving the initial target probability distribution and offering a computationally efficient inference cannot be conducted in a systematic way. To deal with such issues, this paper studies a unified framework, coined asymptotically exact data augmentation (AXDA), which encompasses both well-established and more recent approximate augmented models. In a broader perspective, this paper shows that AXDA models can benefit from interesting statistical properties and yield efficient inference algorithms. In non-asymptotic settings, the quality of the proposed approximation is assessed with several theoretical results. The latter are illustrated on standard statistical problems. Supplementary materials including computer code for this paper are available online.

            @article{Vono_AXDA_2020,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            year = {2020},
            title = {Asymptotically exact data augmentation: models, properties and algorithms},
            journal = {Journal of Computational and Graphical Statistics},
            volume = {},
            number = {},
            pages = {},
            note = {To appear.}
            }
          
  2. Tracers of the ionization fraction in dense and translucent gas. I. Automated exploitation of massive astrochemical model grids
    E. Bron et al.
    Astronomy & Astrophysics (in press), 2020

    The ionization fraction plays a key role in the physics and chemistry of the neutral interstellar medium, from controlling the coupling of the gas to the magnetic field to allowing fast ion-neutral reactions that drive interstellar chemistry. Most estimations of the ionization fraction have relied on deuterated species such as DCO+, whose detection is limited to dense cores representing an extremely small fraction of the volume of the giant molecular clouds they are part of. As large field-of-view hyperspectral maps become available, new tracers may be found. We search for the best observable tracers of the ionization fraction based on a grid of astrochemical models. We build grids of models that sample randomly a large space of physical conditions (unobservable quantities such as gas density, temperature, etc.) and compute the corresponding observables (line intensities, column densities) and the ionization fraction. We estimate the predictive power of each potential tracer by training a Random Forest model to predict the ionization fraction from that tracer, based on these model grids. In both translucent medium and cold dense medium conditions, several observable tracers with very good predictive power for the ionization fraction are found. Several tracers in cold dense medium conditions are found to be better and more widely applicable than the traditional DCO+/HCO+ ratio. We also provide simpler analytical fits for estimating the ionization fraction from the best tracers, and for estimating the associated uncertainties. We discuss the limitations of the present study and select a few recommended tracers in both types of conditions. The method presented here is very general and can be applied to the measurement of any other quantity of interest (cosmic ray flux, elemental abundances, etc.) from any type of model (PDR models, time-dependent chemical models, etc.). (abridged)

            @article{Bron_AA_2020,
            author = {Bron, Emeric; Roueff, Evelyne; Gerin, Maryvonne; Pety, Jérôme; Gratier, Pierre; Le Petit, Franck; Guzman, Viviana; Orkisz, Jan H.; de Souza Magalhaes, Victor; Gaudel, Mathilde; Vono, Maxime; Bardeau, Sébastien; Chainais, Pierre; Goicoechea, Javier R.; Hughes, Annie; Kainulainen, Jouni; Languignon, David; Le Bourlot, Jacques; Levrier, François; Liszt, Harvey Öberg, Karin; Peretto, Nicolas; Roueff, Antoine; Sievers, Albrecht},
            year = {2020},
            title = {Tracers of the ionization fraction in dense and translucent gas: {I}. {A}utomated exploitation of massive astrochemical model grids},
            journal = {Astronomy & Astrophysics},
            volume = {},
            number = {},
            pages = {},
            note = {To appear}
            }
          
  3. C18O, 13CO, and 12CO abundances and excitation temperatures in the Orion B molecular cloud
    A. Roueff et al.
    Astronomy & Astrophysics (in press), 2020

    CO isotopologue transitions are routinely observed in molecular clouds to probe the column density of the gas, the elemental ratios of carbon and oxygen, and to trace the kinematics of the environment. We aim at estimating the abundances, excitation temperatures, velocity field and velocity dispersions of the three main CO isotopologues towards a subset of the Orion B molecular cloud. We use the Cramer Rao Bound (CRB) technique to analyze and estimate the precision of the physical parameters in the framework of local-thermodynamic-equilibrium excitation and radiative transfer with an additive white Gaussian noise. We propose a maximum likelihood estimator to infer the physical conditions from the 1-0 and 2-1 transitions of CO isotopologues. Simulations show that this estimator is unbiased and efficient for a common range of excitation temperatures and column densities (Tex > 6 K, N > 1e14 - 1e15 cm-2). Contrary to the general assumptions, the different CO isotopologues have distinct excitation temperatures, and the line intensity ratios between different isotopologues do not accurately reflect the column density ratios. We find mean fractional abundances that are consistent with previous determinations towards other molecular clouds. However, significant local deviations are inferred, not only in regions exposed to UV radiation field but also in shielded regions. These deviations result from the competition between selective photodissociation, chemical fractionation, and depletion on grain surfaces. We observe that the velocity dispersion of the C18O emission is 10% smaller than that of 13CO. The substantial gain resulting from the simultaneous analysis of two different rotational transitions of the same species is rigorously quantified. The CRB technique is a promising avenue for analyzing the estimation of physical parameters from the fit of spectral lines.

            @article{Roueff_AA_2020,
            author = {Roueff, Antoine; Gerin, Maryvonne; Gratier, Pierre; Levrier, Francois; Pety, Jerome; Gaudel, Mathilde; Goicoechea, Javier R.; Orkisz, Jan H.; de Souza Magalhaes, Victor; Vono, Maxime; Bardeau, Sebastien; Bron, Emeric; Chanussot, Jocelyn; Chainais, Pierre; Guzman, Viviana V.; Hughes, Annie; Kainulainen, Jouni; Languignon, David; Le Bourlot, Jacques; Le Petit, Franck Liszt, Harvey S.; Marchal, Antoine; Miville-Deschenes, Marc-Antoine; Peretto, Nicolas; Roueff, Evelyne; Sievers, Albrecht},
            year = {2020},
            title = {C18O, 13CO, and 12CO abundances and excitation temperatures in the {O}rion {B} molecular cloud: {A}n analysis of the precision achievable when modeling spectral line within the {L}ocal {T}hermodynamic {E}quilibrium approximation},
            journal = {Astronomy & Astrophysics},
            volume = {},
            number = {},
            pages = {},
            note = {To appear}
            }
          
  4. Split-and-augmented Gibbs sampler - Application to large-scale inference problems
    M. Vono, N. Dobigeon, and P. Chainais
    IEEE Transactions on Signal Processing, vol. 67, no. 6, pp. 1648-1661, March 2019

    This paper derives two new optimization-driven Monte Carlo algorithms inspired from variable splitting and data augmentation. In particular, the formulation of one of the proposed approaches is closely related to the alternating direction method of multipliers (ADMM) main steps. The proposed framework enables to derive faster and more efficient sampling schemes than the current state-of-the-art methods and can embed the latter. By sampling efficiently the parameter to infer as well as the hyperparameters of the problem, the generated samples can be used to approximate Bayesian estimators of the parameters to infer. Additionally, the proposed approach brings confidence intervals at a low cost contrary to optimization methods. Simulations on two often-studied signal processing problems illustrate the performance of the two proposed samplers. All results are compared to those obtained by recent state-of-the-art optimization and MCMC algorithms used to solve these problems.

            @article{Vono_TSP_2019,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            year = {2019},
            title = {Split-and-augmented {G}ibbs sampler - {A}pplication to large-scale inference problems},
            journal = {IEEE Transactions on Signal Processing},
            volume = {67},
            number = {6},
            pages = {1648--1661}
            }
          

International conference papers

  1. A fully Bayesian approach for inferring physical properties with credibility intervals from noisy astronomical data
    M. Vono et al.
    IEEE Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Amsterdam, Netherlands, 2019

    The atoms and molecules of interstellar clouds emit photons when passing from an excited state to a lower energy state. The resulting emission lines can be detected by telescopes in the different wavelength domains (radio, infrared, visible, UV...). Through the excitation and chemical conditions they reveal, these lines provide key constraints on the local physical conditions reigning in giant molecular clouds (GMCs), which constitute the birthplace of stars in galaxies. Inferring these physical conditions from observed maps of GMCs using complex astrophysical models of these regions remains a complicated challenge due to potentially degenerate solutions and widely varying signal-to-noise ratios over the map. We propose a Bayesian framework to infer the probability distributions associated to each of these physical parameters, taking a spatial smoothness prior into account to tackle the challenge of low signal-to-noise ratio regions of the observed maps. A numerical astrophysical model of the cloud is involved in the likelihood within an approximate Bayesian computation (ABC) method. This enables to both infer pointwise estimators (e.g., minimum mean square or maximum a posteriori) and quantify the uncertainty associated to the estimation process. The benefits of the proposed approach are illustrated based on noisy synthetic observation maps.

            @Inproceedings{Vono_IEEE_WHISPERS_2019,
            author = {Vono, Maxime et al.},
            title = {A Fully Bayesian Approach For Inferring Physical Properties With Credibility Intervals From Noisy Astronomical Data},
            booktitle = {IEEE Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS)},
            address = {Amsterdam, Netherlands},
            month = {September},
            year = {2019},
            pages = {},
    }
          
  2. Bayesian image restoration under Poisson noise and log-concave prior
    M. Vono, N. Dobigeon, and P. Chainais
    IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP), Brighton, U.K., 2019

    In recent years, much research has been devoted to the restoration of Poissonian images using optimization-based methods. On the other hand, the derivation of efficient and general fully Bayesian approaches is still an active area of research and especially if standard regularization functions are used, e.g. the total variation (TV) norm. This paper proposes to use the recent split-and-augmented Gibbs sampler (SPA) to sample efficiently from an approximation of the initial target distribution when log-concave prior distributions are used. SPA embeds proximal Markov chain Monte Carlo (MCMC) algorithms to sample from possibly non-smooth log-concave full conditionals. The benefit of the proposed approach is illustrated on several experiments including different regularizers, intensity levels and with both analysis and synthesis approaches.

            @Inproceedings{Vono_IEEE_ICASSP_2019b,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            title        = {Bayesian image restoration under {P}oisson noise and log-concave prior},
            booktitle    = {Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP)},
            address      = {Brighton, U.K.},
            month        = {May},
            year         = {2019},
            pages        = {},
    }
          
  3. Efficient sampling through variable splitting-inspired Bayesian hierarchical models
    M. Vono, N. Dobigeon, and P. Chainais
    IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP), Brighton, U.K., 2019

    Markov chain Monte Carlo (MCMC) methods are an important class of computation techniques to solve Bayesian inference problems. Much recent research has been dedicated to scale these algorithms in high-dimensional settings by relying on powerful optimization tools such as gradient information or proximity operators. In a similar vein, this paper proposes a new Bayesian hierarchical model to solve large scale inference problems by taking inspiration from variable splitting methods. Similarly to the latter, the derived Gibbs sampler permits to divide the initial sampling task into simpler ones. As a result, the proposed Bayesian framework can lead to a faster sampling scheme than state-of-the-art methods by embedding them. The strength of the proposed methodology is illustrated on two often-studied image processing problems.

            @Inproceedings{Vono_IEEE_ICASSP_2019a,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            title        = {Efficient sampling through variable splitting-inspired {B}ayesian hierarchical models},
            booktitle    = {Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP)},
            address      = {Brighton, U.K.},
            month        = {May},
            year         = {2019},
            pages        = {},
    }
          
  4. Sparse Bayesian binary logistic regression using the split-and-augmented Gibbs sampler
    M. Vono, N. Dobigeon, and P. Chainais
    IEEE Int. Workshop Machine Learning for Signal Processing (MLSP), Aalborg, Denmark, 2018
    Finalist for the Best Student Paper Awards

    Logistic regression has been extensively used to perform classification in machine learning and signal/image processing. Bayesian formulations of this model with sparsity-inducing priors are particularly relevant when one is interested in drawing credibility intervals with few active coefficients. Along these lines, the derivation of efficient simulation-based methods is still an active research area because of the analytically challenging form of the binomial likelihood. This paper tackles the sparse Bayesian binary logistic regression problem by relying on the recent split-and-augmented Gibbs sampler (SPA). Contrary to usual data augmentation strategies, this Markov chain Monte Carlo (MCMC) algorithm scales in high dimension and divides the initial sampling problem into simpler ones. These sampling steps are then addressed with efficient state-of-the-art methods, namely proximal MCMC algorithms that can benefit from the recent closed-form expression of the proximal operator of the logistic cost function. SPA appears to be faster than efficient proximal MCMC algorithms and presents a reasonable computational cost compared to optimization-based methods with the advantage of producing credibility intervals. Experiments on handwritten digits classification problems illustrate the performances of the proposed approach.

            @inproceedings{Vono_MLSP18,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            title = {Sparse {B}ayesian binary logistic regression using the split-and-augmented {G}ibbs sampler},
            year = {2018},
            booktitle = {Proc. IEEE Int. Workshop Machine Learning for Signal Processing (MLSP), 2018, Aalborg, Denmark}
            }
          

National conference papers

  1. Modèles augmentés asymptotiquement exacts
    M. Vono, N. Dobigeon, and P. Chainais
    GRETSI, Lille, France, 2019

    L’introduction de variables auxiliaires dans un modèle statistique est communément utilisée afin de simplifier une tâche d’inférence ou augmenter son efficacité. Cependant, l’introduction de ces variables telles que la distribution de probabilité initiale soit préservée relève bien souvent d’un art subtil. Cet article présente un cadre statistique unificateur permettant de lever ces verrous en relâchant l’hypothèse d’augmentation exacte. Ce cadre, appelé asymptotically exact data augmentation (AXDA), regroupe certains modèles de mélange, les modèles bayésiens robustes ou encore ceux construits à partir du splitting de variables. Afin d’illustrer l’intérêt d’une telle approche, un échantillonneur de Gibbs basé sur un modèle AXDA est présenté.

            @Inproceedings{Vono_GRETSI_2019a,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            title        = {Modèles augmentés asymptotiquement exacts},
            booktitle    = {Proc. GRETSI},
            address      = {Lille, France},
            month        = {August},
            year         = {2019},
            pages        = {},
    }
          
  2. Un modèle augmenté asymptotiquement exact pour la restauration bayésienne d’images dégradées par un bruit de Poisson
    M. Vono, N. Dobigeon, and P. Chainais
    GRETSI, Lille, France, 2019

    De nombreux travaux ont porté sur la restauration d’images dégradées par un bruit de Poisson. Une grande partie des approches proposées reposent sur des algorithmes d’optimisation ou d’approximation variationnelle. Ces derniers sont rapides et efficaces mais ne permettent pas une estimation précise des intervalles de crédibilité sous la loi a posteriori cible. Ce papier présente une méthode de type Monte Carlo par chaînes de Markov (MCMC) permettant de restaurer ces images tout en apportant une mesure contrôlée des incertitudes liées à l’estimation. L’approche proposée repose sur un modèle augmenté asymptotiquement exact et fait intervenir des algorithmes MCMC proximaux pour échantillonner efficacement les lois d’intérêt.

            @Inproceedings{Vono_GRETSI_2019b,
            author = {Vono, Maxime and Dobigeon, Nicolas and Chainais, Pierre},
            title        = {Un modèle augmenté asymptotiquement exact pour la restauration bayésienne d’images dégradées par un bruit de Poisson},
            booktitle    = {Proc. GRETSI},
            address      = {Lille, France},
            month        = {August},
            year         = {2019},
            pages        = {},
    }
          
  3. Invited talks

    1. Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type Splitting
      Laplace's Demon Seminar, jointly organized by Criteo AI Lab and Nicolas Chopin, September 2020
    2. Asymptotically exact data augmentation: models, algorithms and theory
      Actuarial Mathematics & Statistics Seminar, organized by the School of Mathematics of Heriot Watt University, Edinburgh, U.K., February 2020
    3. Asymptotically exact data augmentation: models, algorithms and theory
      Workshop on optimization, probability and simulation, organized by the Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), Shenzhen (University of Hong Kong), China, December 2019
    4. On artificial intelligence dedicated to supply chain and business strategy
      Mews Digital Day, organized by the management consulting firm Mews Partners, Toulouse, France, April 2019
    5. Split-and-augmented Gibbs sampler - A divide & conquer approach to solve large-scale inference problems
      CRIStAL laboratory seminar organized by the SigMA team, Lille, France, May 2018

    Consulting

    LMlogo ITMlogo

    From September, 2018 to September, 2019, I worked as a data science/analytics consultant, for the Marketing & Strategy direction of an international supermarket chain called Intermarché Alimentaire International (ITM AI). These consultancy missions have been carried out in parallel of my Ph.D. studies and have been the opportunity to apply my previous experiences, knowledge and work to concrete and important issues for retailers, namely sales forecasting, pricing strategy and promotional events.

    Some of the above issues were tackled in my previous internships and Master's thesis (in applied mathematics) where I was particularly interested in optimal pricing policies to apply in clearance and/or promotional events. To this purpose, I did my Master's thesis in partnership with the Pricing direction of a French home-improvement and gardening retailer called Leroy Merlin France (LMF) working on pricing policy optimization for clearance events. The optimal pricing strategy I derived during this work met the requirements and constraints of the Pricing direction of LMF (limited number of price changes, modeling of the buying process uncertainty, etc.), was tested on their past transactional data and was later proposed to some French brick and mortar stores.

    CV [pdf version] (updated on 17.11.19)

    ToulouseUnivlogo LilleUnivlogo CentraleLillelogo

    2017 - 2020 (expected) Ph.D. - Statistics
    University of Toulouse, France

    Optimization-driven Monte Carlo algorithms
    In spring 2019, I was a research visiting student at the Department of Statistics of the University of Oxford in Arnaud Doucet's research group.

    2016 - 2017 M.Sc. - Applied Mathematics (Probability & Statistics)
    University of Lille, France

    Obtained with honors

    2013 - 2017 M.Sc. - Engineering (Data Science)
    Ecole Centrale de Lille, France

    Including a gap year
    Head of the class (rank: 1)

    Find me

    On the Web

    Mailing address

    INP - ENSEEIHT Toulouse
    2, rue Charles Camichel
    B.P. 7122
    31071 Toulouse Cedex 7
    France


    CRISTALlogo IRITlogo CNRSlogo Toulouselogo ENSEEIHTlogo

© 2017. All rights reserved.

Powered by a customized version of Hydejack v7.5.1