$Date: 2004/05/11 02:30:26 $
From Kevin Duh:
Today's seminar was really interesting. Some thoughts I have are:
- We basically talked about the question of knowledge
representation. What struck me was that we kept talking about the
tough examples like "Every nephew of some politician runs." It seems
like the approach here is to develop a representation that won't break
in these boundary cases. In contrast, in purely statistical MT, we are
basically relying on knowledge gained from the common examples in
data. So on the knowledge representation side, we focus our energies
on the rare examples; on the statistical side, we focus on the common
examples. I just thought it's interesting how different various
research communities approach their problems. One reason may be due to
the different nature of different problems (ie. representing all
knowledge vs. aligning words/phrases). But I wonder if the research
community culture is a big part too.
- I'm trying to think how a MRS representation could be used in a
practical MT transfer system. First, we develop a "parser" that will
map the input word strings into the MRS representation. Then, we'll
transform from one MRS representation to the MRS of a different
language. (This might not change the tree drastically, depending on
how concepts in different languages are divided). Finally, a generator
will output target sentences based on this representation. My question
is, what are the components that are most difficult to design, and
therefore might benefit most from machine learning techniques? It
seems to me that the parser/generator is definitely very difficult,
since it's basically the problem of natural language
understanding. But students in Emily's class are already working on
this. What's left, then, is the mapping of predicates between two
languages in MRS form. This is what Emily suggested machine learning
could do. So, we'll need some aligned MRS trees. Since this tree is
usually flatter than syntactic trees (as in Yamada&Knight) and require
less reordering operations, it could conceivably be easier to
train. In this sense, a transfer-based approach could be seen as a
divide-and-conquer strategy where the difficult jobs of parsing,
generating, and semantic transfer are separated. All we need to do is
to pick the sentence that solves this equation: target_sentence =
argmax p(source_MRS|source_sentence) * p(target_MRS|source_MRS) *
p(target_sentence|target_MRS).
- When we talked about the sentence "Every nephew of some politician
runs" we got two readings. Some thought run=jogging, while others
thought run=running for office. For the latter, the word politician
primed the semantic interpretation. As a fun experiment, I asked a few
other fellows in our lab about the sentence. Here is by far one of the
most interesting interpretations of "run"--To paraphrase, it means
"corrupt nephews of some politician running for their lives while
angry mob of citizens chased from behind."
Kevin