Modeling Uncertainty for Information Extraction from Speech Data

David Palmer

Abstract

The goal of this thesis is to develop techniques for post-processing the output of an automatic speech recognition (ASR) system. Specifically, our approach explicitly models ASR errors, which represent an important difference between speech and text-based data, such as newspaper and newswire data. Our approach provides a tight coupling between the ASR system and the natural language processing (NLP) system analyzing its output, resulting in a richer information flow than a simple sequence of output words. We introduce a new probabilistic model for text-based information extraction that improves on previous text and speech data results for a range of ASR error conditions. We describe a method for explicitly modeling uncertainty in the ASR output, and we present our experimental results showing that our uncertainty modeling improves the performance of an information extraction system. Our experiments demonstrate that one component of our uncertainty model, word-level confidence scores, can be improved by using long range document-level and task-dependent information, in addition to the local acoustic and language model information provided by the ASR system. Our method for explicitly modeling errors allows for multiple error types, and we demonstrate improved performance by representing both out-of-vocabulary and in-vocabulary errors. We introduce a multi-pass processing framework, in which our uncertainty modeling assists in the identification of name phrases in ASR output, and the name phrases thereby identified are used to produce a better uncertainty model. The combination of the various advances in uncertainty modeling leads to a 17\% reduction in the error rate of our spoken language information extraction system compared to our state-of-the-art text-based system. Finally, we describe a novel use of our text-based information extraction system to provide efficient and broad coverage of out-of-vocabulary words in the ASR data, and we use a phonetic distance ranking to order the vocabulary and to correct ASR errors. This results in a 97% reduction in the size of the vocabulary, with only a 6% decrease in the vocabulary coverage.

The full thesis in postscript format.


Return to the SSLI Lab Graduate Students Theses Page.