High-Order Modeling Techniques for Continuous Speech
The goal of this work is to develop and explore novel stochastic
modeling techniques for acoustic and language modeling in large
vocabulary continuous speech recognition, particularly recognition of
spontaneous speech. Although significant advances have been made in
recognition technology in recent years, spontaneous speech recognition
accuracy is still hardly better than 50%. More casual speaking modes
introduce additional sources of variability that require improvements
at all levels of the recognition process, both in terms of the
baseline stochastic models and the techniques for adapting these
models. In addressing these challenges, the general theme of the
research in this project is high-level correlation modeling,
i.e. representing correlation of observations beyond the level of the
frame or the word to dependencies within and across utterances
associated with speaker, channel, topic and/or speaking style.
Continuing the ARPA-ONR funded work at Boston University (BU) on
segment-based acoustic modeling for speech recognition, the current
project builds on the stochastic segment model, algorithms developed
for distribution clustering in acoustic modeling and sentence-level
mixture language modeling, and the BU recognition system in general.
The recognition framework also includes a multi-pass search strategy
to accommodate the higher-order (and therefore more computational)
models explored here. In particular, we concentrate on three
problems: development of hierarchical models of intra-utterance
correlation of phones and model states, e.g. by extending the theory
of Markov dependence trees; unsupervised adaptation of acoustic models
within and across utterances based on these models; and sub-language
modeling triggered by acoustic and dialog-level cues. In all cases,
the approach involves developing formal models of statistical
dependence that overcome limitations of existing models, in
combination with exploring fast search and robust parameter estimation
techniques to address the added complexity of these models. Although
we consider radically new models, we also build on the existing
strengths of speech recognition technology, both in the theoretical
foundation and in the use of multi-pass search, with the intention
that advances can be easily used in existing systems.
(January 1995 -- December 1997)
SPONSOR: DoD, Office of Naval Research ONR-N00014-92-J-1778
The publications below were supported all or in part by this
grant. Publications supported by a previous related ONR-ARPA
grant are listed on the
"Parameter Estimation of Dependence Tree Models Using the EM
Algorithm," O. Ronen, J. R. Rohlicek and M. Ostendorf, manuscript
submitted to IEEE Signal Processing Letters, Vol. 2, No. 8, August
1995, pp. 157-159.
"From HMMs to Segment Models: A Unified View of Stochastic Modeling
for Speech Recognition," M. Ostendorf, V. Digalakis and O. Kimball,
manuscript submitted to IEEE Trans. on Speech and Audio Processing.
"Lattice-based Search Strategies for Large Vocabulary Speech
Recognition," F. Richardson, Boston University M.S. Thesis, 1994.
"Auditory-based signal processing for speech recognition,"
S. Zlotkin, Boston University B.S. Project, 1995.
"The 1994 BU NAB News Benchmark System," M. Ostendorf, F.
Richardson, R. Iyer, A. Kannan, O. Ronen and R. Bates, Proceedings of
the ARPA Workshop on Spoken Language Technology, 1995, pp. 139-142.
F. Richardson, M. Ostendorf and J. R. Rohlicek, "Lattice-based
Search Strategies for Large Vocabulary Recognition," Proceedings of
the International Conference on Acoustics, Speech and Signal
Processing, pp. 576-579, May 1995.
Return to the SSLI Lab Projects Page.