Title:
A maximum SNR beamformer approach for blind speech separation under dynamic conditions

Authors:
Shoko Araki and Takuya Yoshioka
NTT Communication Science Laboratories, NTT Corporation, Japan

Abstract:
 The authors participated in task2 of the "Determined convolutive mixtures under dynamic conditions". Our algorithm consists of a blind dereverberation part followed by a speech separation part with maximum SNR beamformers.
 First our algorithm performs blind dereverberation of each mixture signal by using the dereverberation part of [1]. This method cancels the reverberation with a linear time-invariant filter which is estimated by the weighted prediction error method [1].
 Then, we separate each mixture sequence by using maximum SNR beamformers [2]. This method consists of voice activity detection (VAD), direction of arrival (DOA) clustering, and beamformers. VAD discriminates the speech-existence periods and noise-only periods. The DOA clustering is performed for speech-existence periods, and provides the start and end points of the utterances from each direction. For each target direction, the beamformer coefficients are determined so that the ratio between (power of the target direction) and (power of other directions and noise) is maximized. The beamformer coefficients are updated every two seconds.


References:
[1] T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, "Blind separation and dereverberation of speech mixtures by joint optimization," accepted for publication in IEEE Transactions on Audio, Speech, and Language Processing, Jan. 2010.
(available on IEEE Xplore
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5428853&tag=1)

[2] S. Araki, H. Sawada and S. Makino, "Blind speech separation in a meeting situation," ICASSP2007, vol. I, pp. 41--45, 2007.