Blind Audio Source Separation - Combination

We deal with the case where the sources are linearly mixed and the mixtures are underdetermined. Hence, A has more columns than rows. Sparsity of the sources is vital for good separation. Bayesian methods such as the Gibbs Sampler (a standard MCMC simulation method) are used to estimate the sources and the mixing matrix in the presence of noise.

I.I.D. Gaussian noise was added to the observations, which resulted in an SNR of about 16 dB. The mixing matrix used is given by A = [0.4000 0.8315 0.5657; -0.6928 -0.3444 0.5657].

5) Combination of Signals

These are the 3 percussion signals of different types.
Speech Signal 1
"While the Jeffers had reached its limit, it was now mid-August, which meant he had been separated from Marshall from..."
Musical Signal 2
Piano
Percussion Signal 3
Low Frequency Drums

These are the 2 mixtures.
Mixture 1
Mixture 2

Sparsity Indices for various transform types.

Click on thumbnails for larger (and clearer) versions.

Transform Types
1	2	3	4	5	6
DCT	MDCT	WT (Vai)	WT (Sym8)	WPBB	No Transform

Results at a glance.

Performance Measures
1	2	3	4
SDR	SIR	SAR	SNR

5.1) Discrete Cosine Transform

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

5.2) Modified Discrete Cosine Transform

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

The MDCT is the best overall basis and for a combination of signals it performs the best.
It closely approximates the optimal Karhunen-Loeve Transform.

5.3) Wavelet Transform: Vaidyanathan

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

Wavelets perform the worst in this experiment.

5.4) Wavelet Transform: Symmlet 8

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

Wavelets perform the worst in this experiment.

5.5) Wavelet Packet Best Basis

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

The Best Basis performs well too, but certainly the sound quality is worse than the MDCT above.

5.6) No Transform

Reconstructed Speech Signal 1
Reconstructed Musical Signal 2
Reconstructed Percussion Signal 3

Surprisingly, even without a transform, the signals can be reconstructed.

Back
Next
Home

Server at www.eng.cam.ac.uk