Study of the effect of various transforms on performance in blind audio source separation
V. Y. F Tan and C. Févotte


We deal with the case where the sources are linearly mixed and the mixtures are underdetermined. Hence, A has more columns than rows. Sparsity of the sources is vital for good separation. Bayesian methods such as the Gibbs Sampler (a standard MCMC simulation method) is used to estimate the sources and the mixing matrix in the presence of noise.

I.I.D. Gaussian noise was added to the observations, which resulted in an SNR of about 16 dB. The mixing matrix used is given by A = [0.4000 0.8315 0.5657; -0.6928 -0.3444 0.5657].

1) Introduction: Orthonormal Bases

6 different orthonormal bases were used on 4 different sets of signals. The orthonormal bases include the DCT-IV, the MDCT, 2 different versions of the DWT, Wickerhauser's Wavelet Packet Best Basis and no transform. These bases were tested on speech signals, musical signals and percussion signals and a combination of the three.

The model used is the standard BSS model.

x = As+n

Back
Next
Home


Server at www.eng.cam.ac.uk