Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization

ICASSP'2011 Companion page

Cédric Févotte


This demo presents the decomposition results by smooth Itakura-Saito NMF of a 108 seconds-long music excerpt from My Heart (Will Always Lead Me Back To You) recorded by Louis Armstrong and His Hot Five in the twenties. A denoised and upmixed (mono to stereo) version of the recording based on the NMF decomposition is given at the bottom of the page. MATLAB code


Audio data (.wav)
data


Smooth IS-NMF decomposition with fixed dictionary

As described in the paper,  to illustrate the effect of the regularization of the rows of H we perform the following experiment. First we run unpenalized IS-NMF with K=10 components and 5000 iterations, picking the the solution with lowest cost function among ten runs from different random initializations. Then we run smooth IS-NMF with W and H respectively fixed and initialized to the unpenalized solution. The following figure displays a segment of one of the rows of H for different values of parameter λ.

resu

 Baseline (unpenalized IS-NMF) (.wav)
Regularized (λ=1) (.wav)
Regularized (λ=10) (.wav)
Regularized (λ=100) (.wav)
   


Full smooth IS-NMF decomposition


The following figures show the Wiener masks obtained from the decomposition (values between 0 and 1), which are applied to the original STFT data and then inverted to reconstruct time-domain components.

Component 1 (.wav)

component


Component 2 (.wav)

component


Component 3 (.wav)

component


Component 4 (.wav)

component


Component 5 (.wav)

component


Component 6 (.wav)

component


Component 7 (.wav)

component


Component 8 (.wav)

component


Component 9 (.wav)

component


Component 10 (.wav)

component


Audio restoration

original data (.wav)

The decomposition produces large band components (1, 4, 9, 10) and ``pitched'' components (2, 3, 5-8). The pitched components catch bits of notes of the leading instruments. When added up together they allow to single out the trumpet and clarinet.

leading part (.wav)

Component 1 and 2 captures most of the accompaniment (piano, double bass).

acompaniment (.wav)

Component 4 captures the trombone attacks.

trombone (.wav)

Component 10 captures most of the hiss noise present on the recording. Because of the conservativity of the decomposition (the components add up to the original), discarding component 10 produces a denoised version of the recording.

denoised signal (.wav)

The separated components can be remixed in stereo with no degradation thanks to the conservativity of the decomposition, producing a stereo upmix of the original mono. In the following audio sample, the leading part was mixed on the right (70 %), the acompaniment on the left (70%) and the trombone was left in the center. The noise component 10 was discarded, producing a fully restored (denoised and upmixed) version of the original recording.

denoised and upmixed (.wav)