Discourse Mixture Language Modeling

Yuliya Lobacheva

Conversational speech recognition is a very challenging task due to the large amount of variability compared to read speech and the corresponding lack of training data. Where sources of variability are systematic, however, recognition performance can be improved by modifying the structure of the language and/or acoustic model, which mainly comprise a speech recognizer. The focus of this thesis is on incorporating the discourse structure of conversational speech into a language model using mixture distributions. We extend previous work in this area with improved estimation techniques that use clustering to reduce model order, class-based smoothing techniques, and a new strategy for unsupervised training to use additional unlabeled data. In addition, we introduce unsupervised dynamic cache adaptation in order to capture topic changes as well as discourse dynamics. Experimental results on the Switchboard corpus show that discourse mixtures give better results than topic mixtures, with the best discourse mixture model giving an 1.9% reduction in word error rate over a trigram language model. Further gains are achieved by adding a dynamic cache.

The full thesis in postscript format. (898 kB)



Return to the SSLI Lab Graduate Students Theses Page.