Automatic Detection of Prosodic Constiuents for Parsing
Colin Wills Wightman
This dissertation describes research directed towards increasing the
accuracy and processing speed of spoken language systems by developing
methods to utilize prosody. Specifically, algorithms to automatically
label prosodic phrasal structure are developed and a method of using this
information to reduce syntactic ambiguity is investigated. The prosodic phrase
structure is represented by a set of seven prosodic "break indices" motivated
by linguistic theory. Three principal results are discussed: a quantitative
examination of segmental lengthening near prosodic boundaries, an automatic
algorithm for labeling prosodic boundaries, and a parse scoring mechanism.
A measure of segmental lengthening is developed and applied to speech with
hand-labeled break indices to study the relationship between lengthening
and perceived boundary size. Lengthening near phrasal boundaries is found to
be restricted to the rhyme of the final syllable prior to the boundary.
Furthermore, at least four levels of phrasal structure can be differentiated
on the basis of this lengthening. To detect the phrasal boundaries, a speech
recognizer and the sentence transcription are used to obtain a phoneme
segmentation, and a vector of features is generated for each word boundary.
These features are motivated both by the lengthening results and by linguistic
theory. The vectors are quantized via a binary tree and a Hidden Markov Model
is used to recover the sequence of boundaries (break indices) most likely to
have produced the sequence of feature vectors observed. Break indices
generated by this algorithm are highly correlated with those made bytrained
human listeners. A method is developed, by which the labels can be used in a
speech understanding system to help identify the speaker's intended meaning.
The approach taken here is to score a parse using analysis-by-synthesis.
Based on a corpus of speech read by professional radio announcers,
experimental results indicate that this method can achieve performance
comparable to that of human listeners.
By integrating statistical methods and linguistic theory, an algorithm
has been developed which can reduce the syntactic ambiguity encountered
in spoken language systems. In addition, automatic labeling provides a
powerful new tool for the study of prosody.
Return to the SSLI Lab Graduate Students Theses Page.