Automatic Detection of Prosodic Constiuents for Parsing

Colin Wills Wightman

This dissertation describes research directed towards increasing the accuracy and processing speed of spoken language systems by developing methods to utilize prosody. Specifically, algorithms to automatically label prosodic phrasal structure are developed and a method of using this information to reduce syntactic ambiguity is investigated. The prosodic phrase structure is represented by a set of seven prosodic "break indices" motivated by linguistic theory. Three principal results are discussed: a quantitative examination of segmental lengthening near prosodic boundaries, an automatic algorithm for labeling prosodic boundaries, and a parse scoring mechanism. A measure of segmental lengthening is developed and applied to speech with hand-labeled break indices to study the relationship between lengthening and perceived boundary size. Lengthening near phrasal boundaries is found to be restricted to the rhyme of the final syllable prior to the boundary. Furthermore, at least four levels of phrasal structure can be differentiated on the basis of this lengthening. To detect the phrasal boundaries, a speech recognizer and the sentence transcription are used to obtain a phoneme segmentation, and a vector of features is generated for each word boundary. These features are motivated both by the lengthening results and by linguistic theory. The vectors are quantized via a binary tree and a Hidden Markov Model is used to recover the sequence of boundaries (break indices) most likely to have produced the sequence of feature vectors observed. Break indices generated by this algorithm are highly correlated with those made bytrained human listeners. A method is developed, by which the labels can be used in a speech understanding system to help identify the speaker's intended meaning. The approach taken here is to score a parse using analysis-by-synthesis. Based on a corpus of speech read by professional radio announcers, experimental results indicate that this method can achieve performance comparable to that of human listeners.

By integrating statistical methods and linguistic theory, an algorithm has been developed which can reduce the syntactic ambiguity encountered in spoken language systems. In addition, automatic labeling provides a powerful new tool for the study of prosody.


Return to the SSLI Lab Graduate Students Theses Page.