DARPA Communicator Project:
Robust Recognition and Dialog Tracking for Interactive Information Access

As computing and telecommunications technology matures, there is tremendous potential for storage of and access to information, but also the daunting problem of information management. To reach a broad community of users, information management technology must be easy-to-use by non-specialists and accessible via portable, task-specific devices that make use of wireless networks to connect to distributed services. Spoken language is also the most natural and simple mechanism for humans to access information, but it is also a source of information. Speech is the medium for wide range of everyday "information sources", from television and radio broadcasts to voice mail to briefings at meetings; improved access to and management of this information would be very valuable. This project looks at speech as both the interface and the information source, focusing in particular on recorded group meetings.

While recent performance gains in speech recognition and human-computer dialog systems make it possible to envision scenarios of speech-based information management, current technology is not sufficiently robust for useful systems. There has been some work recently on using automatic speech recognition to transcribe spoken data for use with standard information retrieval techniques, but mostly this has focused on broadcast news and not conversational speech where low recognition accuracy can hurt retrieval performance and makes the written form difficult to read. Furthermore, speech recognition technology is brittle in the presence of reverberation and ambient noise, as well as sensitive to packet loss in wireless environments. Dialog tracking, which is needed both for interface control and information indexing, is in its infancy. The research described below will address aspects of these two technologies, or potential "agents", for use in both human-computer interaction (the user interface) and spoken information management.

Speech Recognition: As the first step in the information creation or access process, speech recognition performance is a critical determinant of system success. No amount of sophistication in the language processing step can correct for misrecognized key words, thus accurate and robust recognition is essential for easy-to-use interfaces and speech transcription. This is particularly a problem for the goal of an "invisible computer", since in this case the user will not typically be using a head-mounted microphone, which will accentuate the effects of reverberation and noise. Thus, this project involves research on:

Dialog Tracking: For a user interface based on human-computer dialog, it is important to track the dialog context in order to be able to resolve ambiguous references, to interpret information as a correction vs. a new piece of information, and to know when to clear the dialog context for initiating a new task. Without this ability, users need to form overly detailed queries and/or face tedious interaction sessions because of slow error recovery. For the meeting browser task, topic and speech act annotation is important for introducing structure that facilitates random access and fast-forward and reverse commands. Such annotation allows a user to quickly jump to the first mention of a topic and browse follow-on points, or to find a particular person's response to a posed question. Aspects of dialog tracking that this research addresses are: Response Generation: In human-computer dialog systems, the quality of the voice response is important for human acceptance of the system and the wording choice is an important component of keeping the dialog on track. Aspects of response generation that this research addresses are: (March 2000 -- March 2003)

 SPONSORS: DARPA No. N660019928924, IBM (see also Language Modeling Project), NSF (see also Speech Generation Project)

This project is also linked to the UW Portolano Project on Invisible Computing.





