In the first half of the dissertation, general approaches to topic learning are investigated. Algorithms to combine different partitions are suggested and evaluated on a number of text corpora, offering improvements compared to established baselines. In addition, a novel feature augmentation method is developed that adds to the bag-of-words representation, a small number of word pairs that exhibit a distinct pattern from their constituting words. The approach is evaluated on different corpora and the results show a consistent performance gain for a number of learning methods.
In the second half of the dissertation, issues that are relevant for
topic learning in conversational speech are investigated. In the area
of prosody, the studies involve prominence, i.e. loosely defined as
phrase-level emphasis given to one or more syllables of a
word. Experiments revealed that lack of prominence is an excellent
indicator of low-salient words, using average word statistics from an
automatic prominence detector. The role of disfluencies is
investigated using hand-annotated self-corrections. The experiments
reveal that removing disfluencies has little impact on topic
classification when using the standard bag-of-words
representation. Also, a quantitative analysis of lexical patterns
between genders in conversations is conducted, revealing important
differences, associated with the gender of the conversational
partner. However, integrating gender information in a topic detection
system did not improve the topic classification performance. Finally,
the impact of the errors introduced by the automatic speech
recognition (ASR) component is assessed. A method to cluster words
according to a confusability measure derived from the ASR system is
proposed and shown to offer performance gains compared to using 1-best
transcripts and computational gains compared to using multiple ASR
hypotheses.
The full thesis in pdf format.