Automatic Speech Recognition as a Model for Human Speech Perception


The goal of this project is to use an automatic speech recognizer (ASR) to simulate and predict properties of human speech perception. This approach is based on the more general methodology of using computational learning algorithms as a model of human learning, a concept which has increasingly attracted attention over the past decade. Although it is widely agreed that ASR in not an adequate model of adult speech perception, is may be a reasonable approximation to the perception of phonetic categories by infants, for which certain factors, such as higher-level contextual knowledge, do not play a significant role.
Our specific goal is to model the acquisition of phonetic distinctions (phoneme categories) by infant listeners, using both 'motherese' (overarticulated) speech data and adult speech data as training and test material for a statistical speech recognizer. Results from experimental studies on infant speech perception have shown that infants prefer extremely clear, hyperarticulated ('motherese') speech to normal adult speech. In motherese, formant space regions covered by different vowels are spread further apart than in adult speech. It is likely that these extreme or prototypical 'training samples' help infants to acquire phonetic categories and to make appropriate generalizations when classifying normal adult speech. If automatic speech recognition were a good model of human speech perception, the same effect should be observable when an automatic speech recognizer is trained on motherese speech samples and tested on adult speech. It is generally known in the speech recognition community that mismatched training and test conditions decrease performance. However, it has not yet been tested whether systematically mismatched training data in the form of prototypical examples has a beneficial effect on classifying test data which shows a greater overlap of different class regions.
The potential benefits of this project are twofold: first, the suitability of automatic speech recognition (ASR) as a computational model of phonetic learning will be tested. Second, the results from this pilot project will be used to develop improved training and adaptation methods for automatic speech recognition.

SPONSOR

Institute for Learning and Brain Sciences

PUBLICATIONS: