Robust Estimation of Stochastic Segment Models for Word Recognition
Ashvin Kannan
In this work, we develop robust estimation techniques for a
continuous-word recognition system using the Stochastic Segment model
(SSM). This work is done under the N-best rescoring formalism, where a less
complex system than the SSM is used to generate candidate hypotheses which
are then rescored and reranked by the SSM. Components of the system that
are the focus of this work include estimation of weights for score
combination and robust parameter estimation using clustering techniques to
model context. In particular, we develop several agglomerative and divisive
clustering techniques for multivariate Gaussian distributions, which we use
to cluster triphone models. This leads to better estimates with fewer
parameters resulting in reduction in word error and storage/computation
costs over using unclustered triphones. We also implement an SSM system
based on microsegments which combines mixture modeling with trajectory
modeling and examine the trade-offs involved between the allocation of
distributions for time sequences versus mixture components. Word
recognition performance is reported on the speaker-independent corpus of
the Resource Management database. The recognition rates achieved with the
SSM, 7.0% word error for context-independent and 4.7% for context-dependent
systems, are comparable to the state-of-the-art in speech
recognition. Combining the SSM scores with Hidden Markov model scores
reduces word error rate from 3.8% to 3.1%.
The full thesis in postscript format. (2.04 MB)
Return to the SSLI Lab Graduate Students Theses Page.