Speech Recognition System Design Based on Automatically Derived Units

Michiel Bacchiani

Partial Abstract

In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function.

The full thesis in postscript format.


Return to the SSLI Lab Graduate Students Theses Page.