Blind Audio Source Separation - Musical

We deal with the case where the sources are linearly mixed and the mixtures are underdetermined. Hence, A has more columns than rows. Sparsity of the sources is vital for good separation. Bayesian methods such as the Gibbs Sampler (a standard MCMC simulation method) are used to estimate the sources and the mixing matrix in the presence of noise.

I.I.D. Gaussian noise was added to the observations, which resulted in an SNR of about 16 dB. The mixing matrix used is given by A = [0.4000 0.8315 0.5657; -0.6928 -0.3444 0.5657].

3) Muscial Signals

These are the 3 musical signals.
Musical Signal 1
Arab Strap Guitar
Musical Signal 2
Piano
Musical Signal 3
Guitar

These are the 2 mixtures.
Mixture 1
Mixture 2

Sparsity Indices for various transform types.

Click on thumbnails for larger (and clearer) versions.

Transform Types
1	2	3	4	5	6
DCT	MDCT	WT (Vai)	WT (Sym8)	WPBB	No Transform

Results at a glance.

Performance Measures
1	2	3	4
SDR	SIR	SAR	SNR

3.1) Discrete Cosine Transform

Reconstructed Musical Signal 1
Reconstructed Musical Signal 2
Reconstructed Musical Signal 3

3.2) Modified Discrete Cosine Transform

Reconstructed Musical Signal 1
Reconstructed Musical Signal 2
Reconstructed Musical Signal 3

The MDCT is the obvious transform to achieve sparsity for "conventional" sources like pure music.

3.3) Wavelet Transform: Vaidyanathan

Reconstructed Musical Signal 1
Reconstructed Musical Signal 2
Reconstructed Musical Signal 3

The reconstructed sources are unnatural when one uses wavelets.
This is unsurprising given that wavelets model the transients well but musical signals are composed mainly by tonals.

3.4) Wavelet Transform: Symmlet 8

3.5) Wavelet Packet Best Basis

Reconstructed Musical Signal 1
Reconstructed Musical Signal 2
Reconstructed Musical Signal 3

An adaptive algorithm will provide reasonable separation.
The performance indices are identical to the DCT but slightly worse than the MDCT.
The difference in sound quality is similar to the MDCT and DCT but noise suppression is marginally worse.

3.6) No Transform

Reconstructed Musical Signal 1
Reconstructed Musical Signal 2
Reconstructed Musical Signal 3

No separation without an appropriate transform.
Musical sources are not sparse in physical space.

Back
Next
Home

Server at www.eng.cam.ac.uk