Linguistics is often divided into S-linguistics (syntax and semantics) and P-linguistics (phonetics and phonology), interfacing at the lexical layer. This division carries the assumption that S and P linguistics are conditionally independent given the lexical sequence. However, speech contains paralexical information (such as prosody) that is not represented in the unadorned lexical sequence. This work takes a natural-language processing approach to relaxing this conditional independence assumption by examining ways in which statistical parsers can be improved by incorporating prosodic information. Prosody's impact on parser performance is explored in a series of automatic speech processing experiments on the SWITCHBOARD corpus of conversational telephone speech. These experiments explore three ways in which parser performance can be affected by paralexical information: at the sentence level (as defined by sentence boundaries), the sub-sentence level (as defined by intonational phrase boundaries), and the word level (in terms of recognized-word confidences). Within these levels, prosodically-marked disfluency information, such as self-corrections and hesitations, are also considered.
At the sentence level, this work demonstrates that state-of-the-art parser performance is critically dependent on accurate sentence segmentation. At the sub-sentence level, it demonstrates that even when sentence boundaries are accurately determined, the posterior probabilities of symbolic prosodic boundary features can help statistical parsers to choose the correct parse. Finally, at the word level, this work introduces a new objective for speech-to-parse systems that jointly characterizes syntactic structure and word accuracy. Within this joint scoring framework, experiments show that considering alternate word hypotheses has a much greater impact on performance than alternate parse hypotheses.The full thesis in pdf.