##
Adaptation of Spectral Trajectory Models for Large Vocabulary Continuous
Speech Recognition

### Ashvin Kannan

The goal of this dissertation is to develop effective strategies for the
adaptation of acoustic parameters for a large vocabulary continuous speech
recognition (LVCSR) system from a small amount of speech. Typically this
implies adapting a system characterized by millions of parameters from a
few minutes of speech. This is only possible because the parameters are
highly correlated.
To achieve this goal, a baseline LVCSR system is designed as a starting
point for adaptation based on an acoustic model that is a parsimonious
representation of time variation in that it characterizes a speech segment
as a Gaussian process with a polynomial mean trajectory.
A maximum-likelihood algorithm to cluster polynomial trajectories is
developed and used for parameter tying here, and later for
adaptation. Recognition performance with this system is demonstrated to be
comparable to other state-of-the-art models for LVCSR.

Parametric trajectory models, unlike non-parametric models, allow joint
adaptation of parameters of the trajectory using all observations for that
segment. Maximum-likelihood and Bayesian adaptation algorithms for such
models are developed assuming independence between parameters of different
sound classes, where the classes are determined by clustering.

Finally, the dependencies between different sound classes in the speech of
a particular speaker are modeled as a Gaussian multiscale process defined
by the evolution of a stochastic linear dynamical system on a tree. To
adapt all sound classes with limited adaptation data, adaptation is viewed
as optimal smoothing of such a process. Smoothing algorithms for such
processes have been developed in the past, but parameter estimation of the
process from data was largely an unsolved problem. A maximum-likelihood
solution for parameter estimation based on the expectation-maximization
algorithm is provided for dynamical systems defined on trees.

Results are presented on the Wall Street Journal and Switchboard corpora,
and recognition performance gains are achieved in both supervised and
unsupervised adaptation scenarios.

The full thesis in postscript format. (1.20 MB)

Return to the SSLI Lab Graduate Students Theses Page.