Joint Lexicon and Acoustic Model Design for Spontaneous Speech Recognition
Automatic speech recognition systems typically include a model
representing the acoustic patterns of sub-word units, a lexicon
specifying the word pronunciation in terms of these units, and a
language model that characterizes the likelihood of different word
sequences. Although most parameters in a speech recognition system
are estimated from data by use of an objective function, the unit
inventory and lexicon are generally hand crafted and therefore
unlikely to be optimal. This project involves development of a joint
solution to the related problems of learning a unit inventory and
corresponding lexicon from data. The initial stage of the work
focused on unit design for the case where there is a single
pronunciation per word, resulting in a system that significantly
outperforms current phone-based approaches on the Resource Management
corpus, a 1000 word vocabulary task. This approach requires all words
in the lexicon to be observed in training, which is not practical in a
large vocabulary task. Therefore, we extended the algorithm for use in
a hybrid system that uses automatically derived units when these are
more likely than the phone-based counterparts, resulting in a small
improvement in recognition for conversational speech (the Switchboard
task). Current work focuses on extensions to represent cross-word
context and learning multiple pronunciations. The objective is to
improve large vocabulary speech recognition performance on spontaneous
conversational speech, which has proved to be among the most difficult
of all speech recognition problems.
(August 1996 -- May 1999)
SPONSOR: ATR Interpreting Telecommunications Laboratories
- ``Joint Acoustic Unit Design and Lexicon Generation,''
M. Bacchiani and M. Ostendorf, Proc. of the ESCA Workshop on Modeling
Pronunciation Variation for Automatic Speech Recognition, 1998, pp. 7-12.
- ``Joint Lexicon, Acoustic Unit Inventory and Model Design,'' M. Bacchiani
and M. Ostendorf, submitted manuscript.
- ``Using Automatically-Derived Acoustic Subword Units in Large
Vocabulary Speech Recognition,'' M. Bacchiani and M. Ostendorf,
Proceedings of the International Conference on Spoken Language
Processing, 1998, vol. 5, pp. 1843-1846.
Return to the SSLI Lab Projects Page.