I. Shafran and M. Ostendorf,
"Use of higher level linguistic structure in acoustic modeling for speech recognition,"
Proc. ICASSP, June 2000, pp.1021-1024.
Current speech recognition systems perform poorly on
conversational speech as compared to read speech, largely because of
the additional acoustic variability observed in conversational
speech. Our hypothesis is that there are systematic effects, related
to higher level structures, that are not being captured in the current
acoustic models. In this paper we describe a method to extend standard
clustering to incorporate such features in estimating acoustic
models. We report recognition improvements obtained on the Switchboard
task over triphones and pentaphones by the use of word- and
syllable-level features. In addition, we report preliminary studies on
clustering with prosodic information.
Return to SSLI Lab Publications