×

Beamspace blind signal separation for speech enhancement. (English) Zbl 1168.94422

Summary: Signal processing methods for speech enhancement are of vital interest for communications equipments. In particular, multichannel algorithms, which perform spatial filtering to separate signals that have overlapping frequency content but different spatial origins, are important for a wide range of applications. Two of the most popular multichannel methods are blind signal separation (BSS) and beamforming. Briefly, (BSS) separates mixed sources by optimizing the statistical independence among the outputs whilst beamforming optimizes the look direction of the desired \(source(s)\). However, both methods have separation limitations, in that BSS succumbs to reverberant environments and beamforming is very sensitive to array model mismatch. In this paper, we propose a novel hybrid scheme, called beamspace BSS, which is intended to compensate the aforementioned separation weaknesses by jointly optimizing the spatial selectivity and statistical independence of the sources. We show that beamspace BSS outperforms the separation performance of the conventional sensor space BSS significantly, particularly in reverberant room environments.

MSC:

94A12 Signal theory (characterization, reconstruction, filtering, etc.)

Software:

ICALAB
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Aichner R, Araki S, Makino S, Nishikawa T, Saruwatari H (2002) Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming. In: PIEEE workshop on neural networks for signal processing, pp 445–454, September
[2] Amin MG, Bhalla N (1998) Minimum bias spatial filters for beamspace direction-of-arrival estimation. J Franklin Inst 335(1):35–52 · doi:10.1016/S0016-0032(96)00114-7
[3] Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H (2003) The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans Speech Audio Process 11(2):109–116 · Zbl 1036.94507 · doi:10.1109/TSA.2003.809193
[4] Benesty J, Makino S, Chen J (2005) Speech enhancement. Springer, Berlin
[5] Brandstein M, Ward D (2001) Microphone arrays: Signal processing techniques and applications. Springer, Berlin
[6] Buckley K, Xu XL (1990) Spatial spectrum estimation in a location sector. IEEE Trans Acoust Speech Signal Process 38(11):1842–1852 · doi:10.1109/29.103086
[7] Cardoso JF (1998) Blind signal separation: Statistical principles. Proc IEEE 86(10):2009–2025 · doi:10.1109/5.720250
[8] Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25(5):975–979 · doi:10.1121/1.1907229
[9] Chin DN (2003) Blind source separation of convolutive mixtures of speech. In: Kobsa A, Wahlster W (eds) Adaptive signal processing: Applications to the real world. Signals and communications technology. Springer, Berlin
[10] Cichocki A, Amari S (2002) Adaptive blind signal and image processing. Wiley, West Sussex · Zbl 0999.93013
[11] Douglas SC, Cichocki A (1997) Neural networks for blind decorrelation of signals. IEEE Trans Signal Process 45(11):2829–2842 · doi:10.1109/78.650109
[12] Fancourt C, Parra LC (2001) The generalized sidelobe decorrelator. In: IEEE workshop on the apps of signal processing to audio and acoustics, pp 167–170, October
[13] Grbić N, Nordholm S (2002) Soft constrained subband beamforming for handsfree speech enhancement. In: IEEE int conf on acoustics, speech and signal processing, pp 885–888, May
[14] Griffiths LJ, Jim CW (1982) An alternative approach to linearly constrained adaptive beamforming. IEEE Trans Antennas Propag 30(1):27–34 · doi:10.1109/TAP.1982.1142739
[15] Hassanien A, Elkader SA, Gershman AB, Wong KM (2004) Beamspace preprocessing with an improved robustness against out-of-sector sources using second-order cone programming. In: IEEE workshop on sensor array and multichannel signal processing, pp 347–351, July
[16] Haykin S (2000) Unsupervised adaptive filtering: Blind source separation. Wiley, New York
[17] Horn R, Johnson C (1985) Matrix analysis. Cambridge University Press, Cambridge · Zbl 0576.15001
[18] Ikram MZ, Morgan DR (2002) A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation. In: IEEE int conf on acoustics, speech and signal processing, pp 881–884, May
[19] Lee H, Wengrovitz M (1990) Resolution threshold of beamspace MUSIC for two closely spaced emitters. IEEE Trans Acoust Speech Signal Process 38(9):1545–1559 · Zbl 0721.93067 · doi:10.1109/29.60074
[20] Linebarger DA, DeGroat RD, Dowling EM, Stoica P, Fudge GL (1995) Incorporating a priori information into MUSIC algorithm and analysis. Signal Process 46(1):85–104 · Zbl 0875.94038 · doi:10.1016/0165-1684(95)00074-N
[21] Nordholm S, Claesson I, Dahl M (1999) Adaptive microphone array employing calibration signals: Analytical evaluation. IEEE Trans Speech Audio Process 7(3):241–252 · doi:10.1109/89.759030
[22] Parra LC, Alvino CV (2002) Geometric source separation: Merging convolutive source separation with geometric beamforming. IEEE Trans Speech Audio Process 6(10):352–362 · doi:10.1109/TSA.2002.803443
[23] Parra LC, Alvino CV (2000) Convolutive blind separation of non-stationary sources. IEEE Trans Speech Audio Process 8(3):320–327 · doi:10.1109/89.841214
[24] Peterson PM (1986) Simulating the response of multiple microphones to a single acoustic source in a reverberant room. J Acoust Soc Am 80(5):1527–1529 · doi:10.1121/1.394357
[25] Rodriguez A, Baryshnikov BV, Van Veen BD, Wakai RT (2006) MEG and EEG source localization in beamspace. IEEE Trans Biomed Eng 53(3):430–441 · doi:10.1109/TBME.2005.869764
[26] Sawada H, Mukai R, Araki S, Makino S (2004) A robust and precise method for solving the permutation problem of frequency domain blind source separation. IEEE Trans Speech Audio Process 12(5):530–538 · doi:10.1109/TSA.2004.832994
[27] Tian Z, Van Trees HL (2001) Beamspace MODE. In: Asilomar conf on signals, systems and computers, pp 926–930, November
[28] Weinstein E, Feder M, Oppenheim AV (1993) Multi-channel signal separation by decorrelation. IEEE Trans Speech Audio Process 1(4):405–413 · doi:10.1109/89.242486
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.