Automatic Summarization of Recorded Lectures
In the past decade, the amount of information stored in large, publicly
accessible databases has increased dramatically. Most of this data consists of
text documents, which is why much effort in recent years has been devoted to
text-based information extraction methods, such the as keyword-based document
retrieval techniques familiar from web search engines. However, online data
collections increasingly include not only written documents but also video and
audio documents. For this reason, advanced tools for categorizing, indexing
and extracting information from multimedia documents will become indispensable
in the near future.
The goal of this project is to explore methods for automatically
summarizing spoken documents, in particular recordings of academic lectures.
We will employ automatic speech recognition technology in order to derive a
representation of the spoken document which will serve as the basis for
automatic information extraction and summarization techniques. Particular
emphasis will be given to the use of prosodic information for highlighting
relevant portions of the audio signal. Prosodic information includes
aspects of the speaker's intonation, speaking rate, accentuation of
individual syllables, etc. Along with methods for extracting these parameters,
we will develop new scoring methods which integrate word-based relevance
measures with prosody-based relevance measures and confidence values for
the different information sources.
This project will be carried out in cooperation with the program for
Education at a Distance for Growth and Excellence (EDGE)
at the College of Engineering of the University of Washington.
This program provides educational resources to distance learning students,
which includes streaming video of lectures given by Engineering faculty.
We will use the audio portions of these recordings as data for system
development and evaluation.
SPONSOR
University of Washington Royalty Research Fund
TEAM MEMBERS
Katrin Kirchhoff, PI
PUBLICATIONS:
none yet
PROJECT DURATION
10/01/2001 - 09/30/2002