Blind Audio Source Separation - Percussion

We deal with the case where the sources are linearly mixed and the mixtures are underdetermined. Hence, A has more columns than rows. Sparsity of the sources is vital for good separation. Bayesian methods such as the Gibbs Sampler (a standard MCMC simulation method) are used to estimate the sources and the mixing matrix in the presence of noise.

I.I.D. Gaussian noise was added to the observations, which resulted in an SNR of about 16 dB. The mixing matrix used is given by A = [0.4000 0.8315 0.5657; -0.6928 -0.3444 0.5657].

4) Percussion Signals

These are the 3 percussion signals.
Percussion Signal 1
Hands Clapping
Percussion Signal 2
High Frequency Drums
Percussion Signal 3
Low Frequency Drums

These are the 2 mixtures.
Mixture 1
Mixture 2

Sparsity Indices for various transform types.

Click on thumbnails for larger (and clearer) versions.


Transform Types
1 2 3 4 5 6
DCT MDCT WT (Vai) WT (Sym8) WPBB No Transform

Results at a glance.


Performance Measures
1 2 3 4
SDR SIR SAR SNR

4.1) Discrete Cosine Transform

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

In addition to producing the unnatural sounds, using the DCT also results in significant artifacts.

4.2) Modified Discrete Cosine Transform

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

The reconstructed drum signals are not as natural as the original ones.
Play the original ones again to hear the distinct difference.
Interestingly, the MDCT is not the best transform for percussion audio signals.

4.3) Wavelet Transform: Vaidyanathan

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

The Vaidyanathan wavelets are optimized for speech coding but its performance is the best for the percussion signals, even better than the Best Basis.
It models the transients very well.
Sparsity is achieved and the "beats" and rhythm of the drums can be heard very clearly.

4.4) Wavelet Transform: Symmlet 8

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

The wavelet transforms are very similar and they produce good results.
Sparsity is achieved and the "beats" and rhythm of the drums can be heard clearly.
The sound quality is very similar if we use either of the three wavelet transforms.

4.5) Wavelet Packet Best Basis

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

The adaptive Best Basis transform results in the sparsest sources for the percussion signals.
Notice that they also reproduce sounds that closely resemble the sources.

4.6) No Transform

Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3

The wavelet transforms are very similar and they produce good results.
A transform is needed in the case of percussion signals. The DWT seems to be the best transform.

Back
Next
Home


Server at www.eng.cam.ac.uk