Speech Recognition System Design Based on Automatically Derived Units
In most speech recognition systems today, acoustic modeling and lexical
modeling are viewed as separable problems. Currently the most popular
approach is to manually define canonical word pronunciations in terms of
phonetic units and let the acoustic models capture differences between
actual spoken and canonical pronunciations implicitly with Gaussian
mixture models. As a result, these models can be very broad, particularly
for casual spontaneous speech. An alternative approach, explored in this
thesis, is to learn a unit inventory and pronunciation dictionary from
training data using a maximum likelihood objective function.
The full thesis in postscript format.
Return to the SSLI Lab Graduate Students Theses Page.