University of Washington
Department of Electrical Engineering

SSLI-LAB : Signal, Speech, and Language Interpretation Seminar

Spring Quarter, 2007
RM (see below) EEB Bldg (unless otherwise specified)
University of Washington, Seattle

Fri, 13th April 2007 (EEB-403, 10:30am-11:30AM)
Bayesian Inference of Grammars
-- Mark Johnson
Microsoft Research/Brown University

Abstract
Even though Maximum Likelihood Estimation (MLE) of Probabilistic Context-Free Grammars (PCFGs) is well-understood (the Inside-Outside algorithm can do this efficiently from the terminal strings alone) the inferred grammars are usually linguistically inaccurate. In order to better understand why maximum likelihood finds poor grammars, this talk examines two simple natural language induction problems: morphological segmentation and word segmentation. We identify several problems with the MLE PCFG models of these problems and propose Hierarchical Dirichlet Process (HDP) models to overcome them. In order to test these HDP models we develop MCMC algorithms for Bayesian inference of these models from strings alone. Finally, we discuss to what extent the lessons learnt from these examples can be put into a unified framework and applied to the general problem of grammar induction.

Joint work with Tom Griffiths (Berkeley) and Sharon Goldwater (Stanford)


Past Quarter's Seminars


Last updated ($Date: 2007/10/10 19:34:24 $ UTC)