From Kevin Duh: I wonder if the issue of getting better translation probabilities for phrase pairs is an important issue in phrase-based SMT? The paper presents an interesting solution, but are there more work out there? I see an analogue to the problem of data sparsity for ngram language modeling using long contexts--in ngram modeling, we also have this problem of unrobust probabilities when using ngrams with many conditioning factors. As a result, we have a whole slew of research (e.g. smoothing, class-based LM, etc.) that tries to address this problem. I wonder if we can modify some of the LM techniques to deal with phrase pair probabilities? (For example, Katrin suggested using FLMs to gain robust estimates through morphology.)