##
Segment Modeling Alternatives for Continuous Speech Recognition

### Owen Kimball

This dissertation presents alternative parametric statistical models
of phonetically-based segments for use in continuous speech
recognition (CSR). A categorization of segment modeling approaches is
proposed according to two characteristics: the assumed form of the
probability distribution and the representation chosen for segment
observations. The question of distribution form divides models into
two groups: those based on conditional probability densities of
feature given label and those using a posteriori probabilities of
label given feature. The second characteristic concerns whether a
model uses a variable or fixed-length representation of observed
speech segments. The choices for both characteristics have important
implications, particularly for context modeling and score
normalization. In this work, specific segment models are developed in
order to understand the benefits and limitations that follow from
these choices.
Mixture distributions are a particular type of conditional density
with appealing modeling properties. Under a special case of segment
models using variable-length representations and conditional
densities, various forms of Gaussian mixture models are examined for
the individual samples of the feature sequence. Within this
framework, a systematic comparison of both existing and novel mixture
modeling techniques is conducted. Parameter-tying alternatives for
frame-level mixtures are explored and good performance is demonstrated
with this approach.

Within the conditional-density variable-length framework, a
generalization of mixture distributions that captures properties of
the complete segment is proposed in the form of a segment-level
mixture model. This approach models intra-segment correlation
indirectly using a mixture of segment-length models, each of which
uses conditionally independent time samples. Parameter estimation
formulae are derived and the model is explored experimentally.

The alternative assumption of modeling based on a posteriori
probabilities is examined through the development of a recognition
formalism using classification and segmentation scoring. Posterior
distributions have been less well studied than conditional densities
in the context of CSR, and this work introduces a theoretically
consistent, segment-level posterior distribution model using
context-dependent models. Issues concerning fixed versus
variable-length representations and segmentation scoring are explored
experimentally. Finally, some general conclusions are drawn
concerning the practical and theoretical trade-offs for the models
examined.

The full thesis in postscript format. (803 kB)

Return to the SSLI Lab Graduate Students Theses Page.