Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach. In this talk I will describe work in relational, undirected graphical models for information extraction and data mining, and two pieces of recent work that make small steps in the direction toward joint models that make more unified decisions: (3) a random field method for noun co-reference resolution that has strong ties to graph partitioning, (4) an extension of CRFs to factorial state representation, enabling simultaneous part-of-speech tagging and noun-phrase segmentation.Joint work with colleagues at UMass (Ben Wellner, Khashayar Rohanimanesh, Charles Sutton, David Pinto, Xing Wei, Wei Li, Bruce Croft), CMU (John Lafferty), and UPenn (Fernando Pereira).
Bio: Andrew McCallum is an Associate Professor at University of Massachusetts, Amherst. He was previously Vice President of Research and Development at WhizBang Labs, a company that used machine learning for information extraction from the Web. In the late 1990's he was a Research Scientist and Coordinator at Justsystem Pittsburgh Research Center. He was a post-doctoral fellow at Carnegie Mellon University after receiving his PhD from the University of Rochester in 1995. He is on the editorial board of the Journal of Machine Learning Research. For the past eight years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, document classification, finite state models, and learning from combinations of labeled and unlabeled data. Web page: http://www.cs.umass.edu/~mccallum.
In this talk, I will present an overview of the research areas of the Speech Group at Microsoft Research Asia. Demos and discussions of our work in bi-lingual text to speech, Mandarin speech recognition, and audio indexing will be presented.BIO: Eric Chang graduated from M.I.T. in 1995 with a Ph.D. degree in electrical engineering and computer science. He joined Microsoft Research Asia in July, 1999 to work in the area of speech technologies. Eric is currently the research manager of the speech group and is also spearheading a special project covering mobile user scenario technologies. A recent result from his group is the Chinese version of Office XP, which incorporates the Mandarin speech recognition engine developed at Microsoft Research Asia.
Prior to joining Microsoft Research, Eric was one of the founding members of the Research group at Nuance Communications, a pioneer in natural speech interface software for telecommunication systems. While at Nuance, Eric worked on various projects involving confidence score generation, acoustic modeling, and robust speech detection. He also led the technical effort to develop the Japanese version of the Nuance product. This project led to the world's first deployed Japanese natural language speech recognition system.
Eric has published papers in the fields of speech recognition, neural networks, and genetic algorithms in various journals and conferences. He is the author of several granted and pending patents. His research interests are spoken language understanding, machine learning, and signal processing.
Ancestral graph models were introduced by Richardson and Spirtes (Annals of Statistics, 2002) as a class of graphical models that is closed under conditioning and marginalization. In the case that the variables are jointly normally distributed, the models can be parameterized using "inverse covariances", regression coefficients, and error covariances. In this talk we describe how the maximum likelihood estimator of this set of parameters can be found by iterative conditional fitting. In each step of the suggested iterative conditional fitting procedure, a conditional distribution is estimated while a marginal distribution is assumed to be known and held fixed. We show that in the considered Gaussian case, the estimators of the parameters in the conditional distribution can be obtained by a (least-squares) regression on pseudo-variables.
Modeling speech as a sequence of articulatory movements has long been hypothesized as a way to summarize the information bearing element in a speech signal, and there has been work on using articulatory parameterizations of speech to improve recognition. Several databases provide physical measurements of human articulations in parallel with audio, but not many methods have been developed to make effective use of these measurements in improving acoustic recognition. This talk describes an attempt to use the MOCHA-TIMIT database is such a system. Specifically, it attempts to make use of data that measure the contact between the tongue and hard palate. These measurements are used to guide the training of acoustic parameters in a DBN for phone recognition.
We present a simple yet effective technique to reduce the likelihood computation in ASR systems that use continuous density HMMs. In a variety of speech recognition tasks, likelihood evaluation accounts for a significant portion of the total computational load. Our proposed method, under certain conditions, only evaluates the component likelihoods of certain features, and approximates those of the pruned features by prediction. We investigate two feature clustering approaches associated with our pruning technique. With a data-driven approach, we can speed up the likelihood evaluation by 33% and reduce its power consumption by 27% for an isolated word recognition task. For a continuous speech recognition system using either monophone or triphone models, the speedup and power reduction of the likelihood evaluation are 50% and 35% respectively.