Reducing the Effects of Linear Channel Distortion on Continuous Speech Recognition

Rebecca Anne Bates

Speech recognition over telephone channels introduces challenges not present when speech is recorded using a known, high quality microphone. The goal of this work is to mitigate some of the effects of the telephone channel which reduce accuracy relative to the high quality situation. For practical purposes, a telephone channel is often characterized as an unknown linear, time-invariant system. Given an acoustic model of speech unaffected by channel distortion, it is possible to modify either the model or the input speech based on an estimate of the channel to reduce mismatch in training and test conditions. This work focuses on channel compensation and extends previous results through the introduction of a prior distribution of the channel. The main contributions include: 1) finding a prior distribution for the telephone lines, 2) assessing the usefulness of maximum likelihood and maximum a posteriori channel estimates, implemented in the signal space, and 3) assessing the usefulness of Bayesian learning to modify the model means and covariances for channel compensation in comparison to maximum likelihood estimation of the model transformation parameters. A multipass recognition strategy is used, where the channel estimate is based on the first pass of recognition. The various methods are evaluated using the Macrophone Natural Numbers corpus. The best results are obtained with methods using a prior channel distribution with an acoustic model trained on "cleaned" data; reductions in word error rate of up to 15% over cepstral mean substraction are shown. The model space transformations give only a small improvement over signal space modification.

The full thesis in postscript format. (916 kB)

The full thesis in pdf format. (938 kB)


Return to the SSLI Lab Graduate Students Theses Page.