Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization

ICASSP'2011 Companion page

Cédric Févotte

This demo presents the decomposition results by smooth Itakura-Saito NMF of a 108 seconds-long music excerpt from My Heart (Will Always Lead Me Back To You) recorded by Louis Armstrong and His Hot Five in the twenties. A denoised and upmixed (mono to stereo) version of the recording based on the NMF decomposition is given at the bottom of the page. MATLAB code

Audio data (.wav)

Smooth IS-NMF decomposition with fixed dictionary

As described in the paper, to illustrate the effect of the regularization of the rows of H we perform the following experiment. First we run unpenalized IS-NMF with K=10 components and 5000 iterations, picking the the solution with lowest cost function among ten runs from different random initializations. Then we run smooth IS-NMF with W and H respectively fixed and initialized to the unpenalized solution. The following figure displays a segment of one of the rows of H for different values of parameter λ.

Baseline (unpenalized IS-NMF) (.wav)
Regularized (λ=1) (.wav)

Regularized (λ=10) (.wav)
Regularized (λ=100) (.wav)

Full smooth IS-NMF decomposition

The following figures show the Wiener masks obtained from the decomposition (values between 0 and 1), which are applied to the original STFT data and then inverted to reconstruct time-domain components.

Component 1 (.wav)

Component 2 (.wav)

Component 3 (.wav)

Component 4 (.wav)

Component 5 (.wav)

Component 6 (.wav)

Component 7 (.wav)

Component 8 (.wav)

Component 9 (.wav)

Component 10 (.wav)

Audio restoration

original data (.wav)

The decomposition produces large band components (1, 4, 9, 10) and ``pitched'' components (2, 3, 5-8). The pitched components catch bits of notes of the leading instruments. When added up together they allow to single out the trumpet and clarinet.

leading part (.wav)

Component 1 and 2 captures most of the accompaniment (piano, double bass).

acompaniment (.wav)

Component 4 captures the trombone attacks.

trombone (.wav)

Component 10 captures most of the hiss noise present on the recording. Because of the conservativity of the decomposition (the components add up to the original), discarding component 10 produces a denoised version of the recording.

denoised signal (.wav)

The separated components can be remixed in stereo with no degradation thanks to the conservativity of the decomposition, producing a stereo upmix of the original mono. In the following audio sample, the leading part was mixed on the right (70 %), the acompaniment on the left (70%) and the trombone was left in the center. The noise component 10 was discarded, producing a fully restored (denoised and upmixed) version of the original recording.

denoised and upmixed (.wav)