D. Palmer and M. Ostendorf,
``Robust Information Extraction from Automatically Generated Speech Transcriptions,''
Speech Communication, to appear.

This paper describes a robust system for information extraction from spoken language data. The system extends previous HMM work in information extraction, using a state topology designed for explicit modeling of variable-length phrases and class-based statistical language model smoothing to produce state-of-the-art performance for a wide range of speech error rates. Experiments on broadcast news data show that the system performs well with temporal and source differences in the data. In addition, strategies for integrating word-level confidence estimates into the model are introduced, showing improved performance by using a generic error token for incorrectly recognized words in the training data and low confidence words in the test data.

Return to SSLI Lab Publications