Reducing the Effects of Linear Channel Distortion on Continuous
Speech Recognition
Rebecca Anne Bates
Speech recognition over telephone channels introduces challenges not present
when speech is recorded using a known, high quality microphone. The goal of
this work is to mitigate some of the effects of the telephone channel which
reduce accuracy relative to the high quality situation. For practical
purposes, a telephone channel is often characterized as an unknown linear,
time-invariant system. Given an acoustic model of speech unaffected by channel
distortion, it is possible to modify either the model or the input speech based
on an estimate of the channel to reduce mismatch in training and test
conditions. This work focuses on channel compensation and extends previous
results through the introduction of a prior distribution of the channel. The
main contributions include: 1) finding a prior distribution for the telephone
lines, 2) assessing the usefulness of maximum likelihood and maximum a
posteriori channel estimates, implemented in the signal space, and 3) assessing
the usefulness of Bayesian learning to modify the model means and covariances
for channel compensation in comparison to maximum likelihood estimation of the
model transformation parameters. A multipass recognition strategy is used,
where the channel estimate is based on the first pass of recognition. The
various methods are evaluated using the Macrophone Natural Numbers corpus. The
best results are obtained with methods using a prior channel distribution with
an acoustic model trained on "cleaned" data; reductions in word error rate of
up to 15% over cepstral mean substraction are shown. The model space
transformations give only a small improvement over signal space modification.
The full thesis in postscript format. (916 kB)
Return to the SSLI Lab Graduate Students Theses Page.