Language Modeling with Sentence-Level Mixtures
Rukmini Iyer
Language models play an important role in improving the accuracy of a
continuous speech recognizer. In this thesis, we introduce a new
statistical language model which captures long term topic dependencies of
words within and across sentences. The model includes two main
contributions. First, we develop a topic-dependent sentence-level mixture
language model which takes advantage of the topic constraints in a sentence
or a paragraph. Since this language model is not Markov and has a large
search space, it is used only in the last stage of a multi-pass search
strategy in the recognizer. Second, we introduce topic-dependent dynamic
adaptation techniques in the framework of the mixture model. During the
course of this thesis, we also investigate robust parameter estimation
techniques, which are extremely important in light of the sparse data
problems in language modeling. The model is implemented in the BU speech
recognition system and provides a significant improvement in recognition
accuracy. An important advantage of the framework of our model is that it
is a simple extension of existing language modeling techniques that can
easily be integrated with other language modeling advances.
The full thesis in postscript format. (500 kB)
Return to the SSLI Lab Graduate Students Theses Page.