Combination of Arabic Preprocessing Schemes for Statistical Machine Translation

  1. (PDF, 263 KB)
AuthorSearch for: ; Search for:
ConferenceProceedings of the International Committee on Computational Linguistics and the Association for ComputationalLinguistics (COLING/ACL 2006), July 17-21, 2006., Sydney, Australia
AbstractStatistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evaluate different methods for combining preprocessing schemes resulting in improved translation quality.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number48757
NPARC number8913505
Export citationExport as RIS
Report a correctionReport a correction
Record identifier21a83ebf-dbc5-49f6-9613-e92b3ecd276a
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: