Modeling Uncertainty for Information
Extraction from Speech Data
David Palmer
Abstract
The goal of this thesis is to develop techniques for
post-processing
the output of an automatic speech recognition (ASR) system.
Specifically, our approach explicitly models ASR errors, which
represent an important difference between speech and text-based
data, such as newspaper and newswire data. Our approach provides a
tight coupling between the ASR system and the natural language
processing (NLP) system analyzing its output, resulting in a richer
information flow than a simple sequence of output words. We
introduce a new probabilistic model for text-based information
extraction that improves on previous text and speech data results
for a range of ASR error conditions. We describe a method for
explicitly modeling uncertainty in the ASR output, and we present
our experimental results showing that our uncertainty modeling
improves the performance of an information extraction system. Our
experiments demonstrate that one component of our uncertainty model,
word-level confidence scores, can be improved by using long range
document-level and task-dependent information, in addition to the
local acoustic and language model information provided by the ASR
system. Our method for explicitly modeling errors allows for
multiple error types, and we demonstrate improved performance by
representing both out-of-vocabulary and in-vocabulary errors. We
introduce a multi-pass processing framework, in which our
uncertainty modeling assists in the identification of name phrases
in ASR output, and the name phrases thereby identified are used to
produce a better uncertainty model. The combination of the various
advances in uncertainty modeling leads to a 17\% reduction in the
error rate of our spoken language information extraction system
compared to our state-of-the-art text-based system. Finally, we
describe a novel use of our text-based information extraction system
to provide efficient and broad coverage of out-of-vocabulary words
in the ASR data, and we use a phonetic distance ranking to order the
vocabulary and to correct ASR errors. This results in a 97%
reduction in the size of the vocabulary, with only a 6% decrease in
the vocabulary coverage.
The full thesis in postscript format.
Return to the SSLI Lab Graduate Students Theses Page.