As the primary application of this general approach, we explore long-term dependencies among sub-word units within an utterance, where the variables are units such as phones in English. The motivation for developing this model comes from speech recognition, based on the intuition that phones within an utterance are correlated because the utterance comes from one speaker. This effect is not included in current models that assume speech segments are independent, and it provides important information on how sounds are related to other sounds.
Although discrete dependence tree design algorithms exist, some modifications were needed to apply the technique to speech. We present extensions of prior work, and introduce a new model for continuous observation sequences using hidden dependence trees. Practical limitations of the original algorithm are addressed by robust topology design techniques. The contributions of the thesis also include the development of an efficient algorithm for training discrete and hidden dependence tree models with incomplete data, and the development of a two-level tree growing algorithm that enables the design of large dependence trees.
We apply the model for word recognition by combining its likelihood score with other acoustic and language model scores, showing a small reduction in recognition error rate. We also explore the context-dependent modeling problem with phonetic units conditioned on local context, which is an important step for future use of the model in speech recognition. We describe the mathematical framework for other speech processing applications of the model, and how it is applicable to problems in medical diagnosis as an example of the broad range of problems for which this work has implications.
The full thesis in postscript format. (2.21 MB)