Phrase clustering for smoothing TM probabilities – or, how to extract paraphrases from phrase tables

  1. (PDF, 388 KB)
AuthorSearch for: ; Search for: ; Search for: ; Search for:
Proceedings titleProceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
ConferenceThe 23rd International Conference on Computational Linguistics (COLING 2010), August 23-27, 2010, Beijing, China
Pages608616; # of pages: 9
SubjectInformation and Communication Technologies
AbstractThis paper describes how to cluster to-gether the phrases of a phrase-based sta-tistical machine translation (SMT) sys-tem, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to source-language and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible uses. This paper looks at one of these uses: smoothing the conditional translation model (TM) probabilities employed by the SMT system. We incorporated phrase-cluster-derived probability esti-mates into a baseline loglinear feature combination that included relative fre-quency and lexically-weighted condition-al probability estimates. In Chinese-English (C-E) and French-English (F-E) learning curve experiments, we obtained a gain over the baseline in 29 of 30 tests, with a maximum gain of 0.55 BLEU points (though most gains were fairly small). The largest gains came with me-dium (200-400K sentence pairs) rather than with small (less than 100K sentence pairs) amounts of training data, contrary to what one would expect from the pa-raphrasing literature. We have only be-gun to explore the original smoothing approach described here.
Publication date
AffiliationNational Research Council Canada (NRC-CNRC); NRC Institute for Information Technology
Peer reviewedYes
NPARC number15736686
Export citationExport as RIS
Report a correctionReport a correction
Record identifier68e35bd5-b0b9-4e25-8be2-36e382b8aa1b
Record created2010-07-05
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: