University of Washington
Department of Electrical Engineering
Speech and Signal Processing Seminar

Fall Quarter, 2001
12:30-1:30PM Tuesdays
RM EE1 003 New EE Bldg
University of Washington, Seattle
(unless otherwise noted)

2 October 2001
On Frequency Localized Temporal Processing of Speech
-- Ashwin Rao, Conversational Computing Corp.

9 October 2001
Instantaneous Spectral Moments
-- Prof. Patrick J. Loughlin, University of Pittsburgh

Abstract
Over the past 60 years, it has been discovered that many signals, both natural and man-made, have time-varying spectral characteristics. From early on, it was realized that analyzing these "nonstationary" signals by way of a joint distribution of time and frequency could bring to bear on the problem the standard techniques of ordinary probability and statistics. In particular the concept of conditional moments (e.g., the mean frequency at a given time, the standard deviation, etc.), could offer a means to characterize spectral variations and could be used in the same way that moments are used in other fields. However, defining these quantities for nonstationary signals such that they share the usual properties and interpretations of ordinary moments has been a challenging problem. We will present some recent results in this regard. We have been able to define conditional moments of time and frequency that characterize important time-varying spectral structure. We will illustrate how conditional moments can contribute to our understanding of nonstationary signals, including their characterization, classification, and propagation, and how these features relate to physical attributes of the signal.
16 October 2001
Aural Cinema, Visual Music: Art in the Digital Era
-- Richard Karpen, Director of the UW Center for Digital Arts

Abstract
Artists working with digital technologies are redefining art, music, theater, film, and architecture, often dissolving the boundaries between these traditional forms. For example, composers are creating digital sound art and aural cinema, using complex signal processing techniques to invent sounds of extraordinary intricacy which are virtually located and moved through space. Video artists and computer animators are creating previously unimaginable digital spaces immersed in poetic sensory illusion. Digital Art goes far beyond merely affecting how artists work, and beyond simply using computers to simulate pre-digital forms of art. Artists, engineers, designers, and scientists collaborate and exchange roles to create digitally-realized images, sounds, performances, and installations that have never before been heard, seen, and experienced.

23 October 2001
No seminar this week

30 October 2001
Exploiting Syntactic Structure for Natural Language Modeling
-- Ciprian Chelba, Microsoft Research

Abstract
The talk presents an attempt at using the syntactic structure in natural language for improved language models for large vocabulary speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate --- word lattice rescoring --- over the standard 3-gram language model. Further experiments investigate the portability of syntactic structure across domains --- Wall Street Journal to Air Travel Information Systems --- as well as the use of the structured language model for information extraction from text.

6 November 2001
Object Based Video Indexing/Retrieving: A Step Closer to Automatic Video Understanding
-- Prof. Jenq-Neng Hwang, UW EE Department

Abstract
In this talk, I will present a novel scheme for object-based video indexing and retrieval based on video abstraction and semantic event modeling. The proposed algorithm consists of three major steps; Video Object (VO) extraction, object-based video abstraction and statistical modeling of semantic features. A video object abstraction algorithm based on clustering analysis is described for reducing data redundancy and providing reliable feature data for next stage of the algorithm. Semantic feature modeling scheme is also proposed, which is based on temporal variation of low-level features in object area between adjacent frames of video sequence. Each semantic feature is represented by a Hidden Markov Model (HMM) which characterizes the temporal nature of VO with various combinations of object features. I also include experimental results to demonstrate the effective performance of the proposed approach.

13 November 2001
Modulation Frequency Analysis: New Features for Coding and Classification
-- Prof. Les Atlas, UW EE Department

20 November 2001
Seeing 3D: the Space of All Stereo Images
-- Prof. Steve Seitz, UW CSE Department

Abstract
A stereo pair consists of two images with purely horizontal parallax, that is, every scene point visible in one image projects to a point in the same row of the other. Stereo images play a central role in both human depth perception and computer-based shape reconstruction techniques. However, a single stereo pair typically yields a very incomplete perception of the world, due to limited coverage and field of view. In this talk, I will describe a class of new "panoramic" stereo image representations that can be used to image an entire scene at once. These images can be acquired by moving a conventional camera along a path and compositing pixels from different views into a "multiperspective" mosaic image. Future sensor designs may enable capturing such images directly. I will show several examples of multiperspective stereo images, and motivate their use for visualization and 3D reconstruction of objects and scenes. In addition, I will classify the space of all possible stereo images, by defining all distributions of light rays and sensor designs that produce a stereo pair.

See http://grail.cs.washington.edu/projects/stereo/ for more information on this research.

27 November 2001
Learning Lie Groups for Invariant Pattern Recognition
-- Rajesh Rao, UW CSE Department

4 December 2001
HMM-Based Speech Synthesis -- toward Human-like Talking Machines
-- Prof. Keiichi Tokuda, Nagoya Institute of Technology

Abstract
The increasing availability of large speech databases makes it possible to construct speech synthesis systems, which are referred to as data-driven, corpus-based, speaker-driven, or trainable approach, by applying statistical learning algorithms. These systems, which can be automatically trained, not only generate natural and high quality synthetic speech but also can reproduce voice characteristics of the original speaker. This talk presents one of these approaches: HMM-based speech synthesis in which synthetic speech is generated directly from HMMs. Algorithms for speech parameter generation from HMMs, and a mel-cepstrum based vocoding technique are reviewed, and an approach to simultaneous modeling of spectrum, F0 and state duration is also presented. The main feature of the system is the use of dynamic feature: by inclusion of dynamic coefficients in the feature vector, the dynamic coefficients of the speech parameter sequence generated in synthesis are constrained to be realistic, as defined by the parameters of the HMMs. The attraction of this approach is in that voice characteristics of synthesized speech can be changed by transforming HMM parameters appropriately. Actually, it is shown that we can change voice characteristics of synthetic speech by applying a speaker adaptation technique, speaker interpolation technique, or eigenvoice technique. The relation between the HMM-based approach and other concatenative speech synthesis approaches is also discussed.

11 December 2001
The Graphical Models Toolkit
-- Prof. Jeff Bilmes, UW EE Department

Abstract
This talk will describe the Graphical Models Toolkit (GMTK), an open source, publically available toolkit for developing graphical-model based speech recognition and general time series systems. Graphical models are a flexible, concise, and expressive probabilistic modeling framework with which one may rapidly specify a vast collection of statistical models. The talk will begin with a brief description of the representational and computational aspects of the framework. Following that will be a detailed description of GMTK's features, including a language for specifying structures and probability distributions, logarithmic space exact training and decoding procedures, the concept of switching parents, and a generalized EM training method which allows arbitrary sub-Gaussian parameter tying. Taken together, these features endow GMTK with a degree of expressiveness and functionality that significantly complements other publically available packages. GMTK was recently used in the 2001 Johns Hopkins Summer Workshop, and experimental results will also be described.

--- Joint work with Geoff Zweig.


Past Quarter's Seminars


Last updated ($Date: 2001/12/03 20:49:28 $)