Bayesian Adaptation of the Stochastic Segment Model for Speech Recognition

Burhan Necioglu

Speaker adaptation is frequently used to achieve good speech recognition performance without the high costs associated with training a speaker-dependent model. Being trained with data from a large number of speakers, speaker-independent systems allow an arbitrary speaker to use the system, however, they lack the recognition performance of speaker-dependent systems. Adaptation makes use of the available information in the speaker-independent system and a small amount of data from the target speaker, so a model for that speaker is trained without the effort required to train a speaker-dependent model. The main goal of this thesis is to investigate speaker adaptation for recognizers using multivariate Gaussian densities, specifically the Stochastic Segment Model. A Bayesian approach is followed, with estimation of the parameters of a speaker-adapted model based on prior densities obtained from speaker-independent data. Several schemes have been investigated, including adaptation of only the Gaussian mean, and joint adaptation of the mean vector and the covariance matrix with several indirect methods that include covariance matrix eigenvalue adaptation and covariance matrix determinant adaptation. Word recognition experiments run on the Resource Management database achieve 16% error reduction using mean adaptation with roughly 3 minutes of speech, nearly half the difference between speaker-independent and speaker-dependent recognition rates.
Return to the SSLI Lab Graduate Students Theses Page.