Using Monolingual Source-Language Data to Improve MT Performance.

  1. (PDF, 256 KB)
AuthorSearch for:
ConferenceProceedings of the International Workshop on Spoken Language Translation (IWSLT 2006), November 27-28, 2006., Kyoto, Japan
AbstractStatistical machine translation systems are usually trained on large amounts of bilingual text and of monolingual text in the target language. In this paper, we will present a self-training approach, which additionally explores the use of monolingual source text, namely the documents to be translated, to improve the system performance. An initial version of the translation system is used to translate the source text. Among the generated translations, target sentences of low quality are automatically identified and discarded. The reliable translations together with their sources are then used as a new bilingual corpus for training an additional phrase translation model. Thus, the translation system can be adapted to the new source data even if no bilingual data in this domain is available. Experimental evaluation was performed on a standard ChineseEnglish translation task. We focus on settings where the domain and/or the style of the test data is different from that of the training material. We will show a significant improvement in translation quality through the use of the adaptive phrase translation model. BLEU score rises up to 1.1 points, and mWER is reduced by up to 3.1% absolute.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number48808
NPARC number8914333
Export citationExport as RIS
Report a correctionReport a correction
Record identifierde5c3ff8-2697-49bf-8470-347d38d6eee8
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: