DARPA Communicator Project:
Robust Recognition and Dialog Tracking for
Interactive Information Access
As computing and telecommunications technology matures, there is tremendous
potential for storage of and access to information, but also the daunting
problem of information management. To reach a broad community of users,
information management technology must be easy-to-use by non-specialists
and accessible via portable, task-specific devices that make use of wireless
networks to connect to distributed services. Spoken language is also the
most natural and simple mechanism for humans to access information,
but it is also a source of information. Speech is the medium for
wide range of everyday "information sources", from television and radio
broadcasts to voice mail to briefings at meetings; improved access to and
management of this information would be very valuable. This project looks
at speech as both the interface and the information source, focusing in
particular on recorded group meetings.
While recent performance gains in speech recognition and human-computer
dialog systems make it possible to envision scenarios of speech-based information
management, current technology is not sufficiently robust for useful systems.
There has been some work recently on using automatic speech recognition
to transcribe spoken data for use with standard information retrieval techniques,
but mostly this has focused on broadcast news and not conversational speech
where low recognition accuracy can hurt retrieval performance and makes
the written form difficult to read. Furthermore, speech recognition technology
is brittle in the presence of reverberation and ambient noise, as well
as sensitive to packet loss in wireless environments. Dialog tracking,
which is needed both for interface control and information indexing, is
in its infancy. The research described below will address aspects of these
two technologies, or potential "agents", for use in both human-computer
interaction (the user interface) and spoken information management.
Speech Recognition: As the first step in the information creation
or access process, speech recognition performance is a critical determinant
of system success. No amount of sophistication in the language processing
step can correct for misrecognized key words, thus accurate and robust
recognition is essential for easy-to-use interfaces and speech transcription.
This is particularly a problem for the goal of an "invisible computer",
since in this case the user will not typically be using a head-mounted
microphone, which will accentuate the effects of reverberation and noise.
Thus, this project involves research on:
-
Speech coding for discrete HMMs in wireless environments, for robust performance
and graceful degradation in the presence of packet loss
-
Robust front-end analysis to handle noisy and reverberant environments
-
Improved acoustic models to better use new front-end analysis techniques
Dialog Tracking: For a user interface based on human-computer dialog,
it is important to track the dialog context in order to be able to resolve
ambiguous references, to interpret information as a correction vs. a new
piece of information, and to know when to clear the dialog context for
initiating a new task. Without this ability, users need to form overly
detailed queries and/or face tedious interaction sessions because of slow
error recovery. For the meeting browser task, topic and speech act annotation
is important for introducing structure that facilitates random access and
fast-forward and reverse commands. Such annotation allows a user to quickly
jump to the first mention of a topic and browse follow-on points, or to
find a particular person's response to a posed question. Aspects of dialog
tracking that this research addresses are:
-
Detection of error correction sub-dialogs
-
Recognition of speech acts combining prosodic and language cues
-
Speaker tracking
-
Topic tracking and automatic sub-topic structure annotation
Response Generation: In human-computer dialog systems, the quality
of the voice response is important for human acceptance of the system
and the wording choice is an important component of keeping the dialog
on track. Aspects of response generation that this research addresses are:
-
Using dialog state (detected error corrections) to change the language
generation strategy for faster error recovery
-
Integrated speech synthesis and language generation
(March 2000 -- March 2003)
SPONSORS: DARPA No. N660019928924, IBM (see also Language
Modeling Project), NSF (see also Speech Generation Project)
Become
a user of the UW-Communicator
This project is also linked to the UW Portolano
Project on Invisible Computing.
TEAM MEMBERS:
-
PIs: Prof. Mari Ostendorf,
UW; Prof. Nelson Morgan,
ICSI and UC Berkeley
-
UW Participants: SSLI Lab
Jeff
Bilmes, Asst. Professor
Eve
Riskin, Professor
Katrin Kirchhoff, Research
Asst. Professor
Costas Boulis, Ph.D. candidate
Scott Otterson, Ph.D. candidate
Chia-Ping
Chen, Ph.D. candidate
Sarah
Schwarm, Ph.D. candidate
Julie Goldberg, Ph.D. candidate
Arindam Mandal, Ph.D. candidate
Ivan Bulyko, Ph.D. 2002 (now a Research Associate at UW)
-
ICSI Participants: The
Speech Group at ICSI
Adam Janin, Ph.D. candidate
Dave Gelbart, Ph.D. candidate
Don Baron, M.S. 2002
Liz Shriberg, Research
Staff
Andreas Stolcke, Research
Staff
Dan Ellis, (Former) Research Staff,
now at Columbia Universityt
Eric Fosler-Lussier,
(Former) Post-Doc, now at Lucent
Jane Edwards, Research Staff
Jeremy Ang, M.S. candidate
Sonali Bhagat, undergraduate
Raj Dhillon, undergraduate
APPLICATION TEST-BEDS:
-
Travel Task:
-
Goals include: improve recognition performance in the presence of packet
loss, reliable detection of error correction sub-dialogs
-
Information on our telephone
system (based on the Univ. of Colorado Communicator system)
To try the system, call 1-877-890-2630
-
Meeting Recorder Task:
PRESENTATIONS:
- Sept 00 DARPA PI Meeting:
UW ,
ICSI
- July 01 DARPA PI Meeting:
UW ,
ICSI
PUBLICATIONS:
-
H. J. Nock and S. J. Young,
"WISP: A Comparison of Exact and Approximate Algorithms for Decoding
and Training Loosely-Coupled HMMs", Proc WISP (Institute of
Acoustics) 2001, Stratford-upon-Avon, UK
-
H. J. Nock, Techniques for Modelling Phonological Processes in
Automatic Speech Recognition, Ph.D. dissertation, Cambridge University
Engineering Dept., August 2001.
-
I. Bulyko and M. Ostendorf, "Joint Prosody Prediction and
Unit Selection for Concatenative Speech Synthesis", ICASSP
2001.
-
N. Morgan, D. Baron, J. Edwards, D. Ellis, D. Gelbart, A. Janin,
T. Pfau, E. Shriberg, and A. Stolcke, "The Meeting Project at ICSI,"
Proc. Human Language Technologies Conference, San Diego, March 2001
-
A. Janin and N. Morgan, "SpeechCorder, The Portable Meeting Recorder,"
Workshop on Hands-Free Speech Communication
Kyoto, Japan, April 2001.
-
A. Janin, "Meeting Recorder," Avios, San Jose, April 2001.
-
K. Kirchhoff, "A comparison of classification techniques for the automatic
detection of error corrections in human-computer dialogues",
Proc. NAACL Workshop on
Adaptation in Dialogue Systems, pp. 33-40, Pittsburgh, June 2001.
-
E. Shriberg, A. Stolcke, and D. Baron, "Observations on Overlap:
Findings and Implications for Automatic Processing of Multi-Party
Conversation," Proc. of Eurospeech, September 2001.
-
``Unit Selection for Speech Synthesis Using Splicing Costs with
Weighted Finite State Transducers,'' I. Bulyko and M. Ostendorf,
Proc. of Eurospeech, September 2001.
-
C. Boulis, M. Ostendorf, S. Otterson and E. Riskin, ``Graceful
Degradation of Speech Recognition Performance Over Lossy Packet
Networks,'' Proc. of Eurospeech, September 2001.
-
D. Gelbart and N. Morgan, "Evaluating Long-term Spectral Subtraction
for Reverberant ASR," Proc. IEEE ASRU Workshop, Madonna di
Campiglio, Italy, December 2001.
-
T. Pfau, D.P.W. Ellis, and A. Stolcke, "Multispeaker Speech Activity
Detection for the ICSI Meeting Recorder," Proc. IEEE ASRU
Workshop, Madonna di Campiglio, Italy, 2001.
-
E. Shriberg, A. Stolcke, and D. Baron, "Can Prosody Aid the Automatic
Processing of Multi-Party Meetings? Evidence from Predicting
Punctuation, Disfluencies, and Overlapping Speech," In Proc. ISCA
Tutorial and Research Workshop on Prosody in Speech Recognition and
Understanding, pp. 139-146, Red Bank, NJ, 2001.
-
C. Benitez, L. Burget, B. Chen, S. Dupont, H. Garudadri, H. Hermansky,
P. Jain, S. Kajarekar, S. Sivadas, "Robust ASR front-end using
spectral-based and discriminant features: experiments on the Aurora
tasks," Proc. of Eurospeech, September 2001.
-
"Modelling Asynchrony in ASR Using Loosely-Coupled HMMs",
H. J. Nock and S. J. Young,
Cognitive Science, 26(3):283-301, 2002. (Invited Paper)
-
S. Schwarm and M. Ostendorf, ``Text normalization with varied data sources for conversational
speech language modeling,'' Proc. ICASSP, vol. I, pp. 789-792, 2002.
-
C-.P. Chen, K. Kirchhoff, and J. Bilmes, ``Towards Simple Methods of
Noise Robustness,'' UWEETR-2002-0002, 2002.
-
D. Baron, Prosody-based automatic detection of punctuation and interruption
events in the ICSI meeting recorder corpus,
MS Thesis, EE Dept., UC Berkeley, 2002.
-
I. Bulyko and M. Ostendorf, ``Efficient Integrated Response Generation
from Multiple Targets using Weighted Finite State Transducers,''
Computer Speech and Language, July 2002.
-
I. Bulyko and M. Ostendorf, "A bootstrapping approach to automating
prosodic annotation for limited-domain synthesis," Proc. IEEE
Workshop on Speech Synthesis, Sept. 2002.
-
M. Ostendorf and I. Bulyko, "The impact of speech recognition on speech synthesis," Proc. IEEE
Workshop on Speech Synthesis, Sept. 2002.
-
C-.P. Chen, K. Filali, and J. Bilmes, ``Frontend Post-Processing
and Backend Model Enhancement on the Aurora 2.0/3.0 Databases,''
Proc. ICSLP, 2002.
-
C-.P. Chen, J. Bilmes, and K. Kirchhoff, ``Low-Resource
Noise-Robust Feature Post-Processing on Aurora 2.0,''
Proc. ICSLP, 2002.
-
A. Adami et al., "Qualcomm-ICSI-OGI Features for ASR,"
Proc. ICSLP, 2002.
-
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke,
"Prosody-Based Automatic Detection of Annoyance and
Frustration in Human-Computer Dialog," Proc. ICSLP, 2002.
-
D. Baron, E. Shriberg, and A. Stolcke, "Automatic Punctuation and Disfluency
Detection in Multi-Party Meetings Using Prosodic and Lexical Cues,"
Proc. ICSLP, 2002.
-
D. Gelbart and N. Morgan, "Double the Trouble: Handling Noise and
Reverberation in Far-Field Automatic Speech Recognition"
Proc. ICSLP, 2002.
-
C. Boulis, M. Ostendorf, E. Riskin, and S. Otterson ``Graceful
Degradation of Speech Recognition Performance over Packet-Erasure
Networks,'' IEEE Transactions on Speech and Audio Processing,
to appear.
-
H. Nock and M. Ostendorf, ``Parameter reduction schemes for loosely coupled
HMMs,'' submitted to Computer Speech and Language.
-
S. Bhagat, A. Krupski, R. Dhillon, and E. Shriberg, "Guide for
Labeling Dialog Acts in Multi-Party Meetings," ICSI Technical
Report, in preparation.
For project members only - Meeting web
Return to the SSLI Lab Projects Page.