Majorization-minimization algorithm for smooth Itakura-Saito
nonnegative matrix factorization
ICASSP'2011 Companion page
Cédric Févotte
This demo presents the
decomposition results by smooth Itakura-Saito NMF of a 108
seconds-long music excerpt from My
Heart
(Will Always Lead Me Back To You) recorded by Louis
Armstrong and His Hot Five in the twenties. A denoised and
upmixed (mono to stereo) version of the recording based on the
NMF decomposition is given at the bottom of the page. MATLAB
code
Smooth IS-NMF decomposition with fixed dictionary
As described in the paper, to illustrate the effect of the
regularization of the rows of H
we perform the following experiment. First we run unpenalized
IS-NMF with K=10 components and 5000 iterations,
picking the the solution with lowest cost function among ten
runs from different random initializations. Then we run smooth
IS-NMF with W and H
respectively fixed and initialized to the unpenalized solution.
The following figure displays a segment of one of the rows of H
for different values of parameter λ.
Baseline (unpenalized
IS-NMF) ( .wav)
Regularized (λ=1) ( .wav)
Regularized
(λ=10) ( .wav)
Regularized (λ=100) ( .wav)
Full smooth IS-NMF decomposition
The following figures show the Wiener masks obtained from the
decomposition (values between 0 and 1), which are applied to the
original STFT data and then inverted to reconstruct time-domain
components.
Component
10 ( .wav)
Audio restoration
The decomposition produces large band components (1, 4, 9,
10) and ``pitched'' components (2, 3, 5-8). The pitched
components catch bits of notes of the leading instruments.
When added up together they allow to single out the trumpet
and clarinet.
Component 1 and 2 captures most of the accompaniment (piano,
double bass).
acompaniment
( .wav)
Component 4 captures the
trombone attacks.
Component 10 captures most of the hiss noise present on the
recording. Because of the conservativity of the decomposition
(the components add up to the original), discarding component
10 produces a denoised version of the recording.
denoised
signal ( .wav)
The separated
components can be remixed in stereo with no
degradation thanks to the conservativity of the
decomposition, producing a stereo upmix
of the original mono. In the following audio sample, the
leading part was mixed on the right (70 %),
the acompaniment on the left (70%) and the trombone
was left in the center. The noise component 10 was
discarded, producing a fully restored (denoised and
upmixed) version of the original recording.
denoised
and upmixed ( .wav)
|