Discourse Mixture Language Modeling
Conversational speech recognition is a very challenging task due to the
large amount of variability compared to read speech and the corresponding
lack of training data. Where sources of variability are systematic, however,
recognition performance can be improved by modifying the structure of the
language and/or acoustic model, which mainly comprise a speech recognizer.
The focus of this thesis is on incorporating the discourse structure of
conversational speech into a language model using mixture distributions.
We extend previous work in this area with improved estimation techniques that
use clustering to reduce model order, class-based smoothing techniques, and a
new strategy for unsupervised training to use additional unlabeled data.
In addition, we introduce unsupervised dynamic cache adaptation in order to
capture topic changes as well as discourse dynamics. Experimental results
on the Switchboard corpus show that discourse mixtures give better results
than topic mixtures, with the best discourse mixture model giving an 1.9%
reduction in word error rate over a trigram language model. Further gains
are achieved by adding a dynamic cache.
The full thesis in postscript format. (898 kB)
Return to the SSLI Lab Graduate Students Theses Page.