$Date: 2004/05/11 02:30:26 $
From Kevin Duh: Today's seminar was really interesting. Some thoughts I have are:

  1. We basically talked about the question of knowledge representation. What struck me was that we kept talking about the tough examples like "Every nephew of some politician runs." It seems like the approach here is to develop a representation that won't break in these boundary cases. In contrast, in purely statistical MT, we are basically relying on knowledge gained from the common examples in data. So on the knowledge representation side, we focus our energies on the rare examples; on the statistical side, we focus on the common examples. I just thought it's interesting how different various research communities approach their problems. One reason may be due to the different nature of different problems (ie. representing all knowledge vs. aligning words/phrases). But I wonder if the research community culture is a big part too.
  2. I'm trying to think how a MRS representation could be used in a practical MT transfer system. First, we develop a "parser" that will map the input word strings into the MRS representation. Then, we'll transform from one MRS representation to the MRS of a different language. (This might not change the tree drastically, depending on how concepts in different languages are divided). Finally, a generator will output target sentences based on this representation. My question is, what are the components that are most difficult to design, and therefore might benefit most from machine learning techniques? It seems to me that the parser/generator is definitely very difficult, since it's basically the problem of natural language understanding. But students in Emily's class are already working on this. What's left, then, is the mapping of predicates between two languages in MRS form. This is what Emily suggested machine learning could do. So, we'll need some aligned MRS trees. Since this tree is usually flatter than syntactic trees (as in Yamada&Knight) and require less reordering operations, it could conceivably be easier to train. In this sense, a transfer-based approach could be seen as a divide-and-conquer strategy where the difficult jobs of parsing, generating, and semantic transfer are separated. All we need to do is to pick the sentence that solves this equation: target_sentence = argmax p(source_MRS|source_sentence) * p(target_MRS|source_MRS) * p(target_sentence|target_MRS).
  3. When we talked about the sentence "Every nephew of some politician runs" we got two readings. Some thought run=jogging, while others thought run=running for office. For the latter, the word politician primed the semantic interpretation. As a fun experiment, I asked a few other fellows in our lab about the sentence. Here is by far one of the most interesting interpretations of "run"--To paraphrase, it means "corrupt nephews of some politician running for their lives while angry mob of citizens chased from behind."
Kevin