Blind Audio Source Separation - Percussion
We deal with the case where the sources are linearly mixed and the
mixtures are underdetermined. Hence, A has more columns than rows.
Sparsity of the sources is vital for good separation. Bayesian methods
such as the Gibbs Sampler (a standard MCMC simulation method) are used
to estimate the sources and the mixing matrix in the presence of noise.
I.I.D. Gaussian noise was added to the observations, which resulted in
an SNR of about 16 dB. The mixing matrix used is given by A = [0.4000
0.8315 0.5657; -0.6928 -0.3444 0.5657].
4) Percussion Signals
These are the 3 percussion signals.
Percussion Signal 1
Hands Clapping
Percussion Signal 2
High Frequency Drums
Percussion Signal 3
Low Frequency Drums
These are the 2 mixtures.
Mixture 1
Mixture 2
Sparsity Indices for various transform types.
Click on thumbnails for larger (and clearer) versions.
Transform Types
1 |
2 |
3 |
4 |
5 |
6 |
DCT |
MDCT |
WT (Vai) |
WT (Sym8) |
WPBB |
No Transform |
Results at a glance.
Performance Measures
1 |
2 |
3 |
4 |
SDR |
SIR |
SAR |
SNR |
4.1) Discrete Cosine Transform
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
In addition to producing the unnatural sounds, using the DCT also
results in significant artifacts.
4.2) Modified Discrete Cosine Transform
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
The reconstructed drum signals are not as natural as the original ones.
Play the original ones again to hear the distinct difference.
Interestingly, the MDCT is not the best transform for percussion audio
signals.
4.3) Wavelet Transform: Vaidyanathan
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
The Vaidyanathan wavelets are optimized for speech coding but its
performance is the best for the percussion signals, even better than
the Best Basis.
It models the transients very well.
Sparsity is achieved and the "beats" and rhythm of the drums can be
heard very clearly.
4.4) Wavelet Transform: Symmlet 8
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
The wavelet transforms are very similar and they produce good results.
Sparsity is achieved and the "beats" and rhythm of the drums can be
heard clearly.
The sound quality is very similar if we use either of the three wavelet
transforms.
4.5) Wavelet Packet Best Basis
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
The adaptive Best Basis transform results in the sparsest sources for
the percussion signals.
Notice that they also reproduce sounds that closely resemble the
sources.
4.6) No Transform
Reconstructed Percussion Signal 1
Reconstructed Percussion Signal 2
Reconstructed Percussion Signal 3
The wavelet transforms are very similar and they produce good results.
A transform is needed in the case of percussion signals. The DWT seems
to be the best transform.
Back
Next
Home
Server at www.eng.cam.ac.uk