The goal of this thesis is to improve the n-gram model by learning local lexical structure. We focus on capturing three types of local lexical structure. First, we model words with an expanded vocabulary to account for the observation that a word can have different communication functions, such as part of speech. Second, we extend a variable n-gram learning algorithm that allows both skips and word equivalence classes in word history. The skips are motivated by the occurrences of disfluencies in conversational speech, such as pause fillers and repetitions. The combination of variable n-gram histories and classes allows for an extended maximum length history while reducing the number of parameters. Third, we develop algorithms to learn multi-word lexical units (e.g. "you know") using a special form of variable n-gram learning algorithm. These multi-word lexical units can be modeled either deterministically or non-deterministically. We evaluate our models based on the number of free parameters in the model, the test-set perplexity (a measure of domain difficulty based on entropy) and recognition word error rate.
We show that by learning the local lexical structure of the language, we can reduce the number of parameters needed by more than 40%, and at the same time, reduce the test-set perplexity by 8% and recognition word error by 1%.
The full thesis in postscript format. (2.00 MB)