

本文为德国埃尔朗根-纽伦堡大学(作者:Sebastian Braun)的博士论文,共164页。








Reverberation is the sum of reflected soundwaves and is present in any conventional room. Speech communication devicessuch as mobile phones in hands-free mode, tablets, smart TVs, teleconferencingsystems, hearing aids, voice-controlled systems, etc. use one or moremicrophones to pick up the desired speech signals. When the microphones are notin the proximity of the desired source, strong reverberation and noise candegrade the signal quality at the microphones and can impair theintelligibility and the performance of automatic speech recognizers. Therefore,it is a highly demanded task to process the microphone signals such thatreverberation and noise are reduced. The process of reducing or removingreverberation from recorded signals is called dereverberation. Asdereverberation is usually a completely blind problem, where the only availableinformation are the microphone signals, and as the acoustic scenario can benon-stationary, dereverberation is one of the most challenging tasks in speechenhancement. While in theory perfect dereverberation can be achieved by inversefiltering under some conditions and with knowledge of the room impulse response(RIR), in practice the blind identification of the RIR is not sufficientlyaccurate and robust in time-varying and noisy acoustic conditions. Therefore,successful dereverberation methods have been developed in the time-frequencydomain that often relax the problem to partial dereverberation, where mainlythe late reverberation tail is reduced. Although in the recent years somerobust and efficient methods have been proposed that can reduce the latereverberation tail to some extent, it is still challenging to obtain adereverberated signal with high audio quality, without speech distortion andartifacts using real-time processing techniques with minimal delay. In thisthesis, we focus on robust dereverberation methods for online processing asrequired in real-time speech communication systems. To achieve dereverberation,two main aspects can be exploited: temporal and spatial information. Firstly,reverberation introduces correlation over time and extends the duration ofphonemes or sound events. By exploiting temporal correlation, filters can bederived to extract the desired speech signal or to reduce the reverberation.Secondly, by using multiple microphones, spatial information can be exploitedto distinguish between the coherent direct sound and the reverberation, whichhas a spatially diffuse property. To extract the coherent sound, spatialfilters, also known as beamformers, can be used that combine the microphonesignals such that only sound from a certain direction is extracted, whereassound from other directions and diffuse sound components are suppressed. Inthis thesis, a variety of signal models is exploited to model reverberationusing temporal and spatial aspects. All considered signal models are defined inthe short-time Fourier transform (STFT) domain, which is widely used in manyspeech and audio processing techniques, therefore allowing simple integrationwith other existing techniques. In particular, we utilize a narrowband movingaverage model, a narrowband multichannel autoregressive model, and a spatialcoherence based model. For each of these three signal models, a method fordereverberation and noise reduction is proposed. The first main contribution isa single-channel estimator of the late reverberation power spectral density(PSD), which is required to compute a Wiener filter reducing reverberation andnoise. The proposed reverberation PSD estimator is based on a narrowband movingaverage model using relative convolutive transfer functions (RCTFs). Incontrast to other single-channel reverberation PSD estimators, the proposedestimator explicitly models time-varying acoustic conditions and additivenoise, and requires no prior information on the room acoustics like thereverberation time or the direct-to-reverberation ratio (DRR). The second maincontribution is a multichannel reverberation PSD estimator based on the spatialcoherence, where the reverberation is modeled as an additive diffuse soundcomponent with a time-invariant spatial coherence. In the multichannel case,the desired signal can be estimated by a multichannel Wiener filter (MWF) thatrequires the reverberation PSD. To mitigate speech distortion and artifacts, ageneralized method to control the attenuation of reverberation and noise at theoutput of a MWF independently is proposed. As there exists a wide variety ofsuch single- and multichannel reverberation PSD estimators, an extensiveoverview, comparison and benchmark of state-of-the-art estimators is provided.As a cure for a common weakness of all reverberation PSD estimators, a biascompensation for high DRRs is proposed. The third main contribution is anonline solution for dereverberation and noise reduction based on a narrowbandmultichannel autoregressive (MAR) signal model for time-varying acousticenvironments. Using this model, the late reverberation is predicted fromprevious reverberant speech samples using the MAR coefficients, and is thensubtracted from the current reverberant signal. A main novelty of this approachis a parallel estimation structure, that allows to obtain causal estimates oftime-varying MAR coefficients in noisy environments. In addition, a method tocontrol the amount of reverberation and noise reduction independently isproposed. In the last part of this thesis, the three proposed dereverberationsystems are compared using objective measures, a listening test, and anautomatic speech recognition system. It is shown that the proposed algorithmsefficiently reduce reverberation and noise, and can be directly applied inspeech communication devices. The theoretical overview and the evaluation showsthat each dereverberation method has different strengths and limitations. Byconsidering these algorithms as representatives of their dereverberation class,useful insights and conclusions are provided that can help for the choice of adereverberation method for a specific application.

1 引言
2 去混响的STFT域信号模型
3 频域和空域去混响抑制
4 单通道延迟混响PSD估计
5 多通道延迟混响PSD估计
6 基于多通道自回归模型的MIMO混响消除
7 对提出的去混响方法评估与比较
8 结论与展望
附录A 产生仿真信号的信号能量比定义
附录B 性能测量
附录C 计算残余噪声和混响




