Robust Estimation of Stochastic Segment Models for Word Recognition

Ashvin Kannan

In this work, we develop robust estimation techniques for a continuous-word recognition system using the Stochastic Segment model (SSM). This work is done under the N-best rescoring formalism, where a less complex system than the SSM is used to generate candidate hypotheses which are then rescored and reranked by the SSM. Components of the system that are the focus of this work include estimation of weights for score combination and robust parameter estimation using clustering techniques to model context. In particular, we develop several agglomerative and divisive clustering techniques for multivariate Gaussian distributions, which we use to cluster triphone models. This leads to better estimates with fewer parameters resulting in reduction in word error and storage/computation costs over using unclustered triphones. We also implement an SSM system based on microsegments which combines mixture modeling with trajectory modeling and examine the trade-offs involved between the allocation of distributions for time sequences versus mixture components. Word recognition performance is reported on the speaker-independent corpus of the Resource Management database. The recognition rates achieved with the SSM, 7.0% word error for context-independent and 4.7% for context-dependent systems, are comparable to the state-of-the-art in speech recognition. Combining the SSM scores with Hidden Markov model scores reduces word error rate from 3.8% to 3.1%.

The full thesis in postscript format. (2.04 MB)

Return to the SSLI Lab Graduate Students Theses Page.