Noise Robustness in Automatic Speech Recognition
The issue of noise robustness in automatic speech recognition is of
practical importance and largely unsolved. In this thesis, this
problem is tackled from both perspectives of front-end speech
features and back-end speech models. For the front end, a feature
processing technique consisting of mean subtraction, variance
normalization and ARMA filtering is investigated. Mathematical
analyses are carried out for the distortion of speech features in the
presence of additive and convolutional noises. Extensive
experiments are conducted to see how to best use this front-end
technique. It is experimentally verified to be extremely
effective for the noisy-digit databases of Aurora.
This performance gain is achieved without increasing the model
parameters and computational cost.
For the back end, a novel random variable called a feature selector is
introduced into speech models to dynamically select a robust component
feature to score, ignoring the others. The values of the feature
selectors are based on either the energy or the spectral entropy of
the signal. This back-end technique does not lead to significant
performance gain with the feature streams investigated in this work,
MFCCs and post-processed MFCCs. Yet it is a
novel scheme of integrating multiple information sources.
The full thesis in post-script format.
The full thesis in pdf format.
Return to the SSLI Lab Graduate Students Theses Page.