Comments/Questions about today's discussion paper: ========================================================================= Kwong Tim Ng: I was thinking that why they did not include the POS information when they did word-by-word translation. For example, the English word 'interest' can have several different meanings. If they didn't consider the POS information, the translation for this word can be probably wrong. ========================================================================= From Jeremy Kahn: The independence assumption that the node operations {n,r,t} are entirely mutually independent across nodes, and that within each node, n, r, and t are independent of each other is flawed but not unreasonable. Example: Japanese sentence (glossed to English): book-OBJ buy-PAST. English translation might be a passive: [A] book [was] buy-PAST. German (glossed) might be: Book-NOM was buy-PASSIVE. or Book-OBJ [I-NOM] buy-PAST but the following is wrong: Book-OBJ was buy-PASSIVE. or Book-NOM [I-OBJ] buy-PAST (note inserted words are bracketed) Note that the translation of OBJ to NOM co-varies with the presence of the PASSIVE morphemes. However, the language model might be able to identify this problem -- maybe not with trigrams, though. ===== The model initialization is uniform probability. Both Katrin and I were suggesting that EM's problems with local minima might want to start at several different points and/or start with a linguistically-motivated start point -- at least, set up the t-table based on using words found in the same sentences, or use one of the IBM models to get an initial alignment. ===== How does this model vary as different parsers are tried? What characteristics of the parser are required -- and what characteristics would make this model difficult to use? For example, they chose to flatten much of the binary structure out of the traditional syntactic analyses. =========================================================================