Noise Robustness in Automatic Speech Recognition

Chia-Ping Chen


The issue of noise robustness in automatic speech recognition is of practical importance and largely unsolved. In this thesis, this problem is tackled from both perspectives of front-end speech features and back-end speech models. For the front end, a feature processing technique consisting of mean subtraction, variance normalization and ARMA filtering is investigated. Mathematical analyses are carried out for the distortion of speech features in the presence of additive and convolutional noises. Extensive experiments are conducted to see how to best use this front-end technique. It is experimentally verified to be extremely effective for the noisy-digit databases of Aurora. This performance gain is achieved without increasing the model parameters and computational cost. For the back end, a novel random variable called a feature selector is introduced into speech models to dynamically select a robust component feature to score, ignoring the others. The values of the feature selectors are based on either the energy or the spectral entropy of the signal. This back-end technique does not lead to significant performance gain with the feature streams investigated in this work, MFCCs and post-processed MFCCs. Yet it is a novel scheme of integrating multiple information sources.

The full thesis in post-script format.
The full thesis in pdf format.

Return to the SSLI Lab Graduate Students Theses Page.