The first stage of the project focused on models for the
generation of intonation patterns from text, including phrasal
prominence and tune patterns at the abstract level and F
and energy contours at the acoustic level.
Our strategy was to combine the results of recent
developments in linguistic theory and prosodic transcription with
sophisticated statistical signal processing techniques that allow
automatic estimation of model parameters. The project resulted in
algorithms for (1) predicting prominence placement from text, and (2)
generating F
and
energy contours from abstract phonological labels.
The algorithms can be easily incorporated in
existing synthesis systems, and they have found good listener
perceptual ratings when the models are incorporated into the AT&T TTS
system. In addition, preliminary results suggest that the F
generation model works well
for recognition of prosodic labels. Current efforts involve extending
and improving the recognition results.
SPONSORS:
NYNEX, June 1993 - March 1995
NSF, April 1995 - June 1995
Entropic Research Lab, Jan 1996 - Dec 1996
"A Dynamical System Model for Generating F
for Synthesis," K. Ross
and M. Ostendorf, Proceedings of the ESCA/IEEE Workshop on Speech
Synthesis, pp. 131-134, Sept. 1994.
"A Dynamical System Model for Recognizing Intonation Patterns," K. Ross and M. Ostendorf, Proc. Eurospeech, Sept. 1995.
"A Dynamical System Model for Generating Fundamental Frequency for Speech Synthesis," K. Ross and M. Ostendorf, submitted manuscript.
"Prediction of Abstract Prosodic Labels for Speech Synthesis," K. Ross and M. Ostendorf, submitted manuscript.