Manageable Phrase-based Statistical Machine Translation Models

From National Research Council Canada

Download	View accepted manuscript: Manageable Phrase-based Statistical Machine Translation Models (PDF, 555 KiB)
DOI	Resolve DOI: https://doi.org/10.1007/978-3-540-75175-5_55
Author	Search for: Badr, Ghada¹; Search for: Joanis, Eric¹; Search for: Larkin, Samuel¹; Search for: Kuhn, Roland¹
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Conference	5th International Conference on Computer Recognition Systems CORES 07, Wroclaw, Poland, October 22-25, 2007
Abstract	Statistical Machine Translation (SMT) is an evolving field where many techniques in Syntactic Pattern Recognition (SPR) are needed and applied. A typical phrase-based SMT system for translating from a T (target) language to an S (source) language contains one or more n-gram language models (LMs) and one or more phrase translation models (TMs). These LMs and TMs have a large memory footprint (up to several gigabytes). This paper describes novel techniques for filtering these models that ensure only relevant patterns in the LMs and TMs are loaded during translation. In experiments on a large Chinese-English task, these techniques yielded significant reductions in the amount of information loaded during translation: up to 58% reduction for LMs, and up to 75% for TMs.
Publication date	2007
In	Computer Recognition Systems 2 (Advances in Intelligent and Soft Computing, vol. 45) (2007): 437–444.
Language	English
Peer reviewed	Yes
NRC number	NRCC 49891
NPARC number	9183591
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	f2a4386f-564f-44d4-9c01-c437390b8bb3
Record created	2009-06-30
Record modified	2020-05-10

Date modified:: 2024-04-18