Discriminatively Structured Dynamic Graphical Models for Speech Recognition JHU CLSP Summer workshop for 2001 JHU CSLP and SSLI Laboratory, University of Washington, Dept. of Electrical Engineering
Jeff Bilmes <bilmes@ee.washington.edu> University of Washington, Dept. of EE
Geoff Zweig <gzweig@us.ibm.com>IBM Yorktown Heights Research Center
Thomas S. Richardson <tsr@stat.washington.edu> University of Washington, Dept. of Statistics
Johan Schalkwyk <johans@speechworks.com> Speechworks
Kirk Jackson <kirkjack@afterlife.ncsc.mil> NCSC
Karen Livescu <klivescu@sls.lcs.mit.edu> MIT
Peng Xu <xp@clsp.jhu.edu> JHU
Eva Holtz <eholtz@fas.harvard.edu> Harvard
Jerry Torres <jrey@stanford.edu> Stanford
Sanjeev Khudanpur <sanjeev@clsp.jhu.edu> JHU
Bill Byrne <byrne@clsp.jhu.edu> JHU
The state-of-the-art in automatic speech recognition (ASR) by computer has undergone many significant advances over the past 20 years. The underlying approach, however, still involves using hidden Markov models (HMMs). Most experts believe that we must move beyond the HMM in order to significantly advance the field. This project involves advancing beyond HMMs in careful, data-driven, and task-oriented ways, and applying these techniques to the problem of hands-free ASR in automobiles, and to general conversational ASR.
We will apply the above techniques to a new automobile speech corpus (speech recorded in a variety of acoustic automobile environments) and we also will use new signal-processing methods to extract acoustic features from both normal and array microphones. We will also apply the above techniques to the Switchboard database, a collection of recordings of natural telephone conversations.
- Our methodology will utilize graphical models (GMs), a method whereby a large assortment of statistical models can be quickly and accurately specified. A GM allows one to easily specify the underlying conditional independence properties of a model, and thereby the important properties of naturally spoken speech.
- To reduce model size, we will use switching GMs, where dependencies may activate and deactivate as a function of other graph variables.
- The data itself will be used to determine the structure of the GM, i.e. we won't try to guess what model topology works best but let the data guide our choice. In particular, we will use fine-grained data-mining techniques to automatically discover, from massive collections of speech data, important speech properties to determine the new ASR model.
- Finally, the criterion for specifying GM structure will focus on decreasing ASR errors.
We will use a new GM toolkit, made available and optimized for ASR tasks, to conduct our research. Students will be provided instruction on both the theory of graphical models and use of the toolkit during the two preparatory weeks of the summer.
This research will enhance hands-free interactive voice-response capability in cellular telephones in cars, tourist information kiosks etc. and lead to robust ASR for automatically transcribing group meetings, court proceedings, etc.
Outcome of 1st Planning Meeting
Outcome of 2nd Planning Meeting
Articulatory Reading List (thanks to Katrin Kirchhoff)
Miscellaneous Information:
The entire group can be mailed to via this email list.
Reading Lists:
The following is a list of papers that will be useful to read prior to attending the workshop.
J. Bilmes. Dynamic Bayesian Networks (pdf) The 16th Conference on Uncertainty in Artificial Intelligence, Stanford, July 2000. J. Bilmes. Factored Sparse Inverse Covariance Matrices. (gzipped ps or pdf) IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, June 2000. J. Bilmes. Buried Markov Models for Speech Recognition (gzipped ps or pdf) IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, March 1999.
- J. Bilmes. Data-Driven Extensions to HMM Statistical Dependencies. (gzipped ps or pdf) Int. Conf. on Spoken Language Processing,, Dec 1998.
- J. Bilmes. Graphical Models and Automatic Speech Recognition(pdf ) UWEETR-2001-0005, Dec 2001.
J. Bilmes. Natural Statistical Models for Automatic Speech Recognition. Ph.D. Thesis, Dept. of EECS, CS Division, U.C. Berkeley 1999 (postscript or pdf ).
Zweig Papers
Speech Recognition with Dynamic Bayesian NetworksG. Zweig and S. Russel, AAAI98.
Probabilistic Modeling with Bayesian Networks and Automatic Speech RecognitionG. Zweig and S. Russel, AJII.
Dependency Modeling with Bayesian Networks in a voicemail transcription systemG. Zweig and M. Padmanabhan, Eurospeech99
Speech Recognition with Dynamic Bayesian NetworksG. Zweig Ph.D. thesis
THE TETRAD PROJECT: CONSTRAINT BASED AIDS TO MODEL SPECIFICATION. R.Scheines, C.Glymour, P.Spirtes, C.Meek and T.Richardson, to appear in Multivariate Behavioral Research.
A POLYNOMIAL-TIME ALGORITHM FOR DECIDING MARKOV EQUIVALENCE OF DIRECTED CYCLIC GRAPHICAL MODELS. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Portland, Oregon, 1996. E.Horvitz and F.Jensen (eds)., Morgan Kaufmann, San Francisco, CA.
A DISCOVERY ALGORITHM FOR DIRECTED CYCLIC GRAPHS. In Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Portland, Oregon, 1996. E.Horvitz and F.Jensen (eds.), Morgan Kaufmann, San Francisco, CA.
AUTOMATED DISCOVERY OF LINEAR FEEDBACK MODELS. T.Richardson, P.Spirtes, to appear in Causality and Computation, C.Glymour, (ed.), MIT Press.
THE DIMENSIONALITY OF MIXED ANCESTRAL GRAPHS. P.Spirtes, T.Richardson, C.Meek. CMU-PHIL-83, Nov 1997.
General Graphical Models Papers
K. Murphy's brief overview of Bayes nets. March, 1995 (revised November, 1996).
D. Heckerman, D. Geiger, D. Chickering. Learning Bayesian networks: The Combination of Knowledge and Statistical Data. Technical Report MSR-TR-94-09, Microsoft Research, March, 1994 (revised December, 1994).
Thiesson, Bo ; Meek, Christopher ; Chickering, David Maxwell ; Heckerman, David Learning Mixtures of DAG Models, December 1997 (Revised May 1998)
Bilmes, Jeff; A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models ICSI-TR-97-021
Roweis, S.T. and Ghahramani, Z. A Unifying Review of Linear Gaussian Models
H. Attias Independent Factor Analysis
Ross D. Shachter. Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March, 1995 (revised November, 1996).
Learning Probabilistic Networks by Paul J. Krause, manuscript, 1998.
NIPS 95 Workshop on Learning in Bayesian Networks and Other Graphical Models
A Guide to the Literature on Learning Probabilistic Networks From Data literature review on learning graphical models, in IEEE Trans. on Knowledge and Data Engineering. 235Kb. Final draft submitted 29th Nov., '95
John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, `` Adaptive Probabilistic Networks with Hidden Variables. '' Machine Learning, 29, 213--244, 1997.
N. Friedman. The Bayesian Structural EM Algorithm
N. Friedman. Learning belief networks in the presence of missing values and hidden variables
N. Fridman and D. Koller Being Bayesian about Network Structure
H. Attias 1999. Inferring parameters and structure of latent variable models by variational Bayes. Proc. 15th Conference on Uncertainty in Artificial Intelligence.