TITLE:
A joint audio source separation and dereverberation method applied to SiSEC 2010

PRESENTER:
Takuya Yoshioka (NTT Communication Science Labs, Japan)

ABSTRACT:
I present a method for joint audio source separation and dereverberation, which was applied to one of the SiSEC 2010 tasks, Determined Convolutive Mixtures under Dynamic Conditions. This method
is extensively described in our forthcoming paper [1] and used in our
meeting recognition and understanding system [2, 3]. The method
computes estimates of source signals by using separation and
dereverberation sub-systems that are connected in tandem. The
dereverberation sub-system is based on multi-channel linear prediction
filters while the separation sub-system is based on frequency-domain
separation matrices. The method optimizes the the sub-systems'
parameters jointly so that its output signals conform to a prescribed
speech model as much as possible. By virtue of the use of the
dereverberation sub-system, the method can separate audio mixtures
even in highly reverberant environments. This method is closely
related to our LVA/ICA 2010 paper [4]. In the presentation, I will
explain the method and show some demonstrations.


REFERENCES:
[1] T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, "Blind
separation and dereverberation of speech mixtures by joint
optimization," IEEE Transactions on Audio, Speech, and Language
Processing, 2010, accepted for publication, doi:
10.1109/TASL.2010.2045183.

[2] S. Araki, et al., "Online meeting recognizer with multichannel
speaker diarization," submitted to the 2010 Asilomar Conference on
Signals, Systems, and Computers (Asilomar 2010), November 2010.

[3] T. Hori, S. Araki, T. Yoshioka, M. Fujimoto, S. Watanabe, T. Oba,
A. Ogawa, K. Otsuka, D. Mikami, K. Kinoshita, T. Nakatani, A.
Nakamura, and J. Yamato, "Real-time meeting recognition and understanding using distant microphones and omni-directional camera," submitted to the 2010 IEEE Workshop on Spoken Language Technology (SLT 2010), December 2010.

[4] H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, K. Kashino,
"Statistical model of speech signals based on composite autoregressive
system with application to blind source separation," accepted for
publication in Proceedings of the 9th International Conference on
Latent Variable Analysis and Signal Separation (LVA/ICA 2010),
September 2010.