University of Washington
Department of Electrical Engineering

SSLI-LAB : Signal, Speech, and Language Interpretation Seminar

Spring Quarter, 2005
RM EE1-403 EE1 Bldg (unless otherwise specified)
University of Washington, Seattle

Wed, 14th Dec 2005 (EE1-303, 2:00pm-3:00PM)
High dimensional probabilistic modelling through manifolds
-- Neil Lawrence
University of Sheffield, U.K.

Abstract
Density modelling in high dimensions is a traditionally very difficult problem. Approaches such as mixtures of Gaussians typically fail to capture the structure of data sets in high dimensional spaces. In this talk we will argue that for many data sets of interest, the data can be represented as a lower dimensional manifold immersed in the higher dimensional space. We will then present the Gaussian Process Latent Variable Model (GP-LVM), a non-linear probabilistic variant of principal component analysis (PCA) which implicitly assumes that the data lies on a lower dimensional space. We will demonstrate the application of the model to a range of data sets, but with a particular focus on human motion data. We will show some preliminary work on facial animation and make use of a skeletal motion capture data set to illustrate differences between our model and traditional manifold techniques.

Bio: Neil Lawrence received his PhD from Cambridge University in 2000 after which he spend a year at Microsoft Research, Cambridge. Currently he his a senior lecturer in the Department of Computer Science, University of Sheffield, U.K.. His research interests are probabilistic models with a particular focus on Gaussian processes. He has a particular interest in applications of these models and recent work has involved him in computational biology, speech, vision and graphics. See here for more details.

Thursday, 10th Nov 2005 (AE-108, 2:00pm-3:00PM)
Indexing Uncertainty for Efficient Search in Spoken Documents
-- Ciprian Chelba
Microsoft Research

Abstract
Speech search has not received much attention due to the fact that large collections of un-transcribed spoken material have not been available, mostly due to storage constraints. As storage is becoming cheaper, the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Our current work aims at extending the standard keyword search paradigm from text documents to spoken documents. In order to deal with limitations of current automatic speech recognition (ASR) technology we propose an approach that uses RECOGNITION LATTICES --- which are considerably more accurate than the ASR 1-best output. The position of a given word in the spoken document becomes a random variable. Standard techniques are extended to make use of SOFT-HITS --- a probabilistic version of the regular hit encountered in text indexing. In experiments performed on a collection of lecture recordings --- MIT iCampus data --- the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer.

Monday, 7th Nov 2005 (EE1-403, 11:00-12:00PM)
Some Recent Advances in Gaussian Mixture Modeling for Speech Recognition
-- Ramesh A. Gopinath
Manager, Pervasive Speech Technologies HLT Department, IBM T. J. Watson Research

Abstract
State-of-the-art Hidden Markov Model (HMM) based speech recognition systems typically use Gaussian Mixture Models (GMMs) to model the acoustic features associated with each HMM state. Due to computational, storage and robust estimation considerations the covariances of the Gaussians in these GMMs are typically diagonal. In this talk I will describe several new techniques to model the acoustic features associated with an HMM state better - subspace constrained GMMs (SCGMMs), non-linear volume-preserving acoustic feature space transformations etc. Even with better models, one has to deal with mis-matches between the training and test conditions. This problem can be addressed by adapting either the acoustic features or the acoustic models to reduce the mis-match. In this talk I will present several approaches to adaptation - FMAPLR (a variant of FMLLR that works well with very little adaptation data), adaptation of the front-end parameters, adaptation of SCGMMs etc. While the ideas presented are explored and evaluated in the context of speech recognition, the talk should appeal to anyone with an interest in statistical modeling.


Past Quarter's Seminars


Last updated ($Date: 2005/12/13 07:43:49 $)