The first stage of the project focused on models for the generation of intonation patterns from text, including phrasal prominence and tune patterns at the abstract level and F and energy contours at the acoustic level. Our strategy was to combine the results of recent developments in linguistic theory and prosodic transcription with sophisticated statistical signal processing techniques that allow automatic estimation of model parameters. The project resulted in algorithms for (1) predicting prominence placement from text, and (2) generating F and energy contours from abstract phonological labels. The algorithms can be easily incorporated in existing synthesis systems, and they have found good listener perceptual ratings when the models are incorporated into the AT&T TTS system. In addition, preliminary results suggest that the F generation model works well for recognition of prosodic labels. Current efforts involve extending and improving the recognition results.
NYNEX, June 1993 - March 1995
NSF, April 1995 - June 1995
Entropic Research Lab, Jan 1996 - Dec 1996
"A Dynamical System Model for Generating F for Synthesis," K. Ross and M. Ostendorf, Proceedings of the ESCA/IEEE Workshop on Speech Synthesis, pp. 131-134, Sept. 1994.
"A Dynamical System Model for Recognizing Intonation Patterns," K. Ross and M. Ostendorf, Proc. Eurospeech, Sept. 1995.
"A Dynamical System Model for Generating Fundamental Frequency for Speech Synthesis," K. Ross and M. Ostendorf, submitted manuscript.
"Prediction of Abstract Prosodic Labels for Speech Synthesis," K. Ross and M. Ostendorf, submitted manuscript.