The overall objective of this work is to understand the use of statistical modelling of intonation patterns generated in isolated words and in continuous speech. Our approach involves performing four major experiments: (1) isolated word intonation recognition, (2) boundary tone clustering, (3) boundary tone classification, and (4) spotting of boundary tone in continuous speech.
We employ discrete hidden Markov models (HMM) to characterize intonation patterns, because HMMs have been successful in modelling the random spectral and temporal structure of speech for work recognition. Since we use discrete distribution HMMs, vector quantization of the features is necessary to generate discrete observations, and different methods of vector quantization are explored.
For isolated word intonation recognition, we search for the best combination of feature processing, vector quantization, and hidden Markov modelling techniques for recognition of statement, question, command, calling, and continuation patterns. A best case accuracy of 89% was achieved using minimum distortion VQ and 3-state HMMs.
For boundary tone clustering, HMMs are used to characterized each cluster. Distinctions finer than "rise" and "fall" are obtained using a divisive clustering procedure. Typically, these distinctions were associated with prominence. A boundary tone classification experiment correctly identified discretely extracted boundary tones as rise or fall with 86% accuracy. Finally, boundary tones were spotted in continuous speech at an average detection rate of 33% with a false alarm rate of 1.0 pre known boundary tone.