D. Palmer and M. Ostendorf,
``Robust Information Extraction from Automatically Generated Speech Transcriptions,''
Speech Communication, to appear.
This paper describes a robust system for information extraction from
spoken language data. The system extends previous HMM work in
information extraction, using a state topology designed for explicit modeling
of variable-length phrases and class-based statistical language model
smoothing to produce state-of-the-art performance for a wide range of
speech error rates. Experiments on broadcast news data show
that the system performs well with temporal and source differences in
the data. In addition, strategies for integrating word-level
confidence estimates into the model are introduced, showing improved
performance by using a generic error token for incorrectly recognized
words in the training data and low confidence words in the test data.
Return to SSLI Lab Publications