##
Dependence Tree Models of Intra-Utterance Phone Dependence

### Orith Ronen

This thesis addresses the problem of modeling statistical dependence among
a large set of random variables or vectors for use in pattern recognition
applications. For a set of n random variables, the full joint
probability function is in an n-dimensional space, or an nd-dimensional
space for d-dimensional random vectors. For large n, it is useful to
approximate the distribution in a manner that reduces dimensionality and
still captures correlations. To achieve this goal, we approximate the
joint distribution using a type of hierarchical models, called dependence
trees. Dependence trees make a Markov assumption on the branches of a tree
for modeling a set of random variables with no temporal structure.
As the primary application of this general approach, we explore long-term
dependencies among sub-word units within an utterance, where the variables
are units such as phones in English. The motivation for developing this
model comes from speech recognition, based on the intuition that phones
within an utterance are correlated because the utterance comes from one
speaker. This effect is not included in current models that assume speech
segments are independent, and it provides important information on how
sounds are related to other sounds.

Although discrete dependence tree design algorithms exist, some
modifications were needed to apply the technique to speech. We present
extensions of prior work, and introduce a new model for continuous
observation sequences using hidden dependence trees. Practical limitations
of the original algorithm are addressed by robust topology design
techniques. The contributions of the thesis also include the development
of an efficient algorithm for training discrete and hidden dependence tree
models with incomplete data, and the development of a two-level tree
growing algorithm that enables the design of large dependence trees.

We apply the model for word recognition by combining its likelihood score
with other acoustic and language model scores, showing a small reduction in
recognition error rate. We also explore the context-dependent modeling
problem with phonetic units conditioned on local context, which is an
important step for future use of the model in speech recognition. We
describe the mathematical framework for other speech processing
applications of the model, and how it is applicable to problems in medical
diagnosis as an example of the broad range of problems for which this work
has implications.

The full thesis in postscript format. (2.21 MB)

Return to the SSLI Lab Graduate Students Theses Page.