Abstract | State of the art phrase-based statistical machine translation systems typically contain two features which estimate the “forward” and “backward” conditional translation probabilities for a given pair of source and target phrase. These two “relative frequency” (RF) features are derived from three counts: the joint count of the source and target phrase and their marginal counts. We propose to “unpack” these three statistics, making them independent “3-count” features instead of two RF features. In our experiments, the 3-count features perform better than the RF ones in three of four systems we tested. By transforming and generalizing these 3-count features slightly, further improvements are obtained. Furthermore, under several different experimental conditions, we compare 3-count and generalized 3-count features to new features derived from Kneser-Ney smoothing, to a new low-frequency penalty feature, and to several known smoothing/ discounting schemes. Generalized 3-count performs similarly to or better than all of the smoothing methods except modified Kneser-Ney. In our experiments, the best phrase table (not language model) smoothing yields +0.6-1.4 BLEU. |
---|