Bayesian Adaptation of the Stochastic Segment Model for
Speech Recognition
Burhan Necioglu
Speaker adaptation is frequently used to achieve good speech recognition
performance without the high costs associated with training a
speaker-dependent model. Being trained with data from a large number of
speakers, speaker-independent systems allow an arbitrary speaker to use the
system, however, they lack the recognition performance of speaker-dependent
systems. Adaptation makes use of the available information in the
speaker-independent system and a small amount of data from the target
speaker, so a model for that speaker is trained without the effort required
to train a speaker-dependent model. The main goal of this thesis is to
investigate speaker adaptation for recognizers using multivariate Gaussian
densities, specifically the Stochastic Segment Model. A Bayesian approach is
followed, with estimation of the parameters of a speaker-adapted model based
on prior densities obtained from speaker-independent data. Several schemes
have been investigated, including adaptation of only the Gaussian mean, and
joint adaptation of the mean vector and the covariance matrix with several
indirect methods that include covariance matrix eigenvalue adaptation and
covariance matrix determinant adaptation. Word recognition experiments run on
the Resource Management database achieve 16% error reduction using mean
adaptation with roughly 3 minutes of speech, nearly half the difference
between speaker-independent and speaker-dependent recognition rates.
Return to the SSLI Lab Graduate Students Theses Page.