# SAMoVA/MINDS seminar

Since the begining of 2018, we organize with the MINDS team of IRIT a joint seminar of research. The topics are mainly focused on machine learning, and Deep Neural Networks hold a significant place. This seminar includes talks of invited speakers, as well as workshops. It is organized by Patrice Guyot and Oumaima El Mansouri.

Upcoming Event:

**April, 16th, 2019***. Weakly-supervised deep learning approaches for sound event detection*. Thomas Pellegrini

Past events in 2018:

**June, 13-15th***. Deep learning workshop*. András Horvát**April, 4th***. Spike-based computing and learning (...)*. Timothée Masquelier**March, 13th***. Image compression via parallel compressed sensing (...)*. Sergiy A. Vorobyov

## Weakly-supervised deep learning approaches for sound event detection

**April, 16th, 2019. **Thomas Pellegrini

**Abstract:**
Current impressive results in computer vision and machine listening are mainly driven by strong supervision made possible by the availability of large annotated datasets. Weakly-supervised approaches aim at lowering the need for carefully annotated and are a way to eventually strengthen the generalization power of the models regarding unseen conditions. In this talk, I will review weakly-supervised deep learning approaches in particular for the task of Sound Event Detection (SED). SED systems aim to detect possibly overlapping audio events, and locate the events temporally in recordings, i.e. determining event onsets and offsets. I will begin with a brief overview of recurrent neural networks, present state-of-the-art deep neural networks trained when only "weak labels" are available for learning. Weak labels refer to audio tags at recording level with no information on temporal onsets and offsets of the annotated events. I will review two main research directions: i) the introduction of attention mechanisms in the network architecture, ii) the use of Multiple Instance Learning inspired objective functions. I will comment on their limitations and how these could be overcome.

## Deep learning workshop

**June, 13th, 14th and 15th, 2018. **András Horváth

### Material: python files, presentation

**Abstract:** Modern Machine Learning become ubiquitous in various task and help solve complex problems in the past years, which seemed unsolvable before. The appearance of modern machine learning and deep learning frameworks like Tensorflow and Pytorch made it possible to implement and train neural networks and complex computation graphs with ease and efficiency. During this workshop the participants will learn how to use these methods, will become familiar with the basic concepts of deep learning and also some advanced techniques which can increase the generalization power of these networks.

During the course the theory of deep learning algorithms and neural networks will be covered and some of them will be implemented in Tensorflow and there capabilities will be investigated through simple problems. Participants has to register to Google Colab (http://colab.research.google.com), where all the used resources will be available during the course.

**Bio: **András Horváth is an artificial intelligence researcher specialized in machine vision. As an Associate Professor at Peter Pazmany Catholic University, Faculty of Information Technology and Bionics his research and teaching involvements focuses on machine intelligence and computer vision.
He is engaged in various international projects, such as the DARPA-UPSIDE program developing efficient algorithms for object detection and recognition exploiting non-Boolean architectures.
The projects and solutions implemented by their group and faculty provide solutions among others in healthcare, finance, biology, automotive and agricultural industries.
András regularly lectures and facilitates workshops in machine learning, and he is in collaboration with international research and academic institutions such as Intel, Hughes Research Lab, M.I.T or the University of Notre Dame.
As Algorithm Development Lead he has also worked in a former US-Hungarian start up later acquired by a leading global telecommunications company on advancing cutting-edge computer vision technologies for smart cities.

## Spike-based computing and learning in brains, machines, and visual systems in particular

**April, 4th, 2018. Timothée Masquelier**

### Material: presentation

**Abstract:** We have first shown that, thanks
to the physiological learning mechanism referred to as spike
timing-dependent plasticity (STDP), neurons can detect and learn
repeating spike patterns, in an unsupervised manner, even when
those patterns are embedded in noise, and the detection can be
optimal. Importantly, the spike patterns do not need to repeat
exactly: it also works when only a firing probability pattern
repeats, providing this profile has narrow (10-20ms) temporal
peaks. Brain oscillations may help in getting the required
temporal precision, in particular when dealing with slowly
changing stimuli. All together, these studies show that some
envisaged problems associated to spike timing codes, in particular
noise-resistance, the need for a reference time, or the decoding
issue, might not be as severe as once thought. These generic
STDP-based mechanisms are probably at work in particular the
visual system, where they can explain how selectivity to visual
primitives emerges, leading to efficient object recognition. High
spike time precision is required, and microsaccades could help.
All these mechanisms are appealing for neuromorphic engineering
applications. They can lead to fast, energy efficient systems
which can learn online, in an unsupervised manner.

**Image compression via parallel compressed
sensing with permutation and segmented compressed sensing – the
ways to more efficient compressive beamforming**

**March, 13th, 2018**. Sergiy A. Vorobyov

**Abstract***: *For a multidimensional signal (like 2D images), if reshaped into a vector, the required size of the sensing matrix in compressive sensing framework becomes dramatically large, which increases the storage and computational complexity significantly. To solve this problem, the multidimensional signal is reshaped into a 2D signal, which is then sampled and reconstructed column by column using the same sensing matrix. This approach is referred to as parallel compressed sensing, and it has much lower storage requirements and computational complexity. For a given reconstruction performance of parallel compressed sensing, if a so-called acceptable permutation is applied to the 2D signal, the corresponding sensing matrix is shown to have a smaller required order of restricted isometry property condition, and thus, lower storage requirements and computation complexity at the decoder are required. At the level of analog-to-information conversion, an analog signal measured by a number of parallel branches of mixers and integrators (BMIs), each characterized by a specific random sampling waveform, can first be segmented in time. Then the subsamples collected on different segments and different BMIs can be reused so that a larger number of samples than the number of BMIs is collected. This technique is shown to be equivalent to extending the measurement matrix, which consists of the BMI sampling waveforms, by adding new rows without actually increasing the number of BMIs. Such extended measurement matrix still satisfies the restricted isometry property with overwhelming probability if the original measurement matrix of BMI sampling waveforms satisfies it.
The effect of correlation among measurements for the segmented CS can be characterized by a penalty term in the corresponding bounds on the measurement rate. This penalty term is vanishing as the signal dimension increases, which means that the performance degradation due to the fixed correlation among measurements obtained by the segmented CS (as compared to the standard CS with equivalent size measurement matrix) is negligible for a high-dimensional signal.
These results have important implications for compressive beamforming, one of the major application of which is for ultrasound imaging. If time permits, some of these implications will be informally discussed with the final objective of improving the compressive beamforming computational efficiency.

**Bio:** Sergiy A. Vorobyov (IEEE M’02–SM’05–F’18) received the M.Sc. and Ph.D. degrees in systems and control from Kharkiv National University in mid and late 1990’s. He is currently a Professor with the Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland. He was with the University of Alberta, Edmonton, AB, Canada, as an Assistant, Associate, and then Full Professor. Since his graduation, he also held various research and faculty positions with Kharkiv National University of Radio Electronics; the Institute of Physical and Chemical Research (RIKEN), Japan; McMaster University, Canada; Duisburg-Essen University and Darmstadt University of Technology, Germany; and the Joint Research Institute between Heriot-Watt University and Edinburgh University, U.K. His research interests include optimization and multilinear algebra methods in signal processing; statistical and array signal processing; sparse signal and image processing; estimation and detection theory; sampling theory; and multiantenna, very large, cooperative, and cognitive systems. Dr. Vorobyov is a recipient of the 2004 IEEE Signal Processing Society Best Paper Award, the 2007 Alberta Ingenuity New Faculty Award (Canada), the 2011 Carl Zeiss Award (Germany), the 2012 NSERC Discovery Accelerator Award (Canada), and other awards.