M. Ostendorf,
``Moving beyond the `beads-on-a-string' model of speech,''
Proc. IEEE ASRU Workshop, 1999, to appear.

The notion that a word is composed of a sequence of phone segments, sometimes referred to as `beads on a string', has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech. We offer two different alternatives -- automatically derived subword units and linguistically motivated distinctive feature systems -- and discuss current work in these directions. In addition, we look at problems that arise in acoustic modeling when trying to incorporate higher-level structure with these two strategies.

Get Postscript



Return to SSLI Lab Publications