* complain about state of the art SMT is lexical, where each surface form in the source and target is an independent entity. bad because of sparsity. data sparsity is bad enough for popular languages like english, but it's a nightmare for any language with a rich morphology (and ~anything is richer than english). goals: generalize over morphology to reduce data sparsity model syntactic coherence (morph agreement) in the target language. this has been done before, but mostly by using a target LM (including related work here: Mei+Katrin, Kevin+Katrin, Amittai) there's also been a lot of work done on modifying the translation input to make it better approximate the structure of the desired output, with the hope that this will improve alignments and thus translation. (see section 3 in the paper). * this paper trained a model that predicts inflected forms, given the source sentence and a sequence of stems in the target sentence. used: word and word alignments, plus "lexical resources that provide morph info about the words"-- so whatever you have (probably a parser and a dictionary. they also had projected dependency structures, presumably created by the MSR treelet translation project). worked on english--russian and english--arabic. [discussion of russian + arabic morphology omitted] a characteristic of morph complex languages is a richer system of agreements-- so some non-lexical notions are needed if we're going to have agreement spread out over a larger chunk of a sentence than just adjacent words. * inflection prediction - stemming (set of possible lemmas for a word w) - inflection (set of surface forms with the same stem as w) - morph analysis (set of analysis vectors for w) * the learning model - MaxEnt Markov model, which allows you to decompose the prob of a predicted inflection sequence into a product of local probs for individual predictions, conditioned on some small number of previous predictions. here they used 2nd order (trigram) prediction. - the features they use pair items from the context (what you have so far on the target side, plus the source element for the current slot) and the target label for the current element. - structured prediction: the new target word has to match the predicted morphological analysis vector, not just be the highest prob word given the previous two elements. * features and their categories two axes for kinds of features: - monolingual (target language only) vs bilingual (obtained from following the word alignments back into to the source sentence) - lexical (surface forms + stems: this looks kinda like a trigram LM, except it can also use information from the right context) vs morphological (used to describe the target label and its context, so a sequence of POS tags would count), vs syntactic (derived from a parser: stem of the parent of a word, eg) monolingual lexical: stem of word and its adjacent words, plus LM features monolingual morph: attributes of current prediction plus two previous ones monolingual syntactic: stem of parent node bilingual lexical: words aligned to current one, plus all words aligned to immediate neighbors bilingual morph and syntactic: it's complicated. * experiments to cut down on noise, they used aligned sentencs pairs from reference translations as the input to their postprocessing, rather than SMT output. - data: technical (limited) domain, 1million aligned en-ru, 500k en-ar. - lexicon: russian general-domain available, buckwalter for arabic. - baseline: compared to picking a morphological inflection at random from the possible ones for the current word. that is, "how does our system fare compared to chance?" also, a trigram LM.