Unpacking and transforming feature functions : new ways to smooth phrase tables

From National Research Council Canada

Download	View accepted manuscript: Unpacking and transforming feature functions : new ways to smooth phrase tables (PDF, 579 KiB)
Author	Search for: Chen, Boxing¹; Search for: Kuhn, Roland¹; Search for: Foster, George¹; Search for: Johnson, Howard¹
Affiliation	National Research Council of Canada. Information and Communication Technologies
Format	Text, Article
Conference	Machine Translation Summit XIII, 19-23 September 2011, Xiamen, China
Abstract	State of the art phrase-based statistical machine translation systems typically contain two features which estimate the “forward” and “backward” conditional translation probabilities for a given pair of source and target phrase. These two “relative frequency” (RF) features are derived from three counts: the joint count of the source and target phrase and their marginal counts. We propose to “unpack” these three statistics, making them independent “3-count” features instead of two RF features. In our experiments, the 3-count features perform better than the RF ones in three of four systems we tested. By transforming and generalizing these 3-count features slightly, further improvements are obtained. Furthermore, under several different experimental conditions, we compare 3-count and generalized 3-count features to new features derived from Kneser-Ney smoothing, to a new low-frequency penalty feature, and to several known smoothing/ discounting schemes. Generalized 3-count performs similarly to or better than all of the smoothing methods except modified Kneser-Ney. In our experiments, the best phrase table (not language model) smoothing yields +0.6-1.4 BLEU.
Publication date	2011-09-23
In	Proceedings of the 13th Machine Translation Summit (23 September 2011): 269–275.
Language	English
Peer reviewed	Yes
NPARC number	21267976
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	f943e893-18a5-4f0b-95db-6e080fedb4bb
Record created	2013-03-27
Record modified	2020-06-04

Date modified:: 2024-04-19