Download | - View accepted manuscript: Searching for poor quality machine translated text : learning the difference between human writing and machine translations (PDF, 564 KiB)
|
---|
DOI | Resolve DOI: https://doi.org/10.1007/978-3-642-30353-1 |
---|
Author | Search for: Carter, Dave1; Search for: Inkpen, Diana |
---|
Affiliation | - National Research Council of Canada. NRC Institute for Information Technology
|
---|
Format | Text, Article |
---|
Conference | 25th Canadian Conference on Artificial Intelligence, Canadian AI 2012, 28-30 May 2012, Toronto, Ontario, Canada |
---|
Abstract | As machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98.We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus. |
---|
Publication date | 2012-05 |
---|
In | |
---|
Series | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
NPARC number | 20496817 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | cf9f7d1a-96a1-4b36-8355-6c808a7f3f4d |
---|
Record created | 2012-08-16 |
---|
Record modified | 2020-04-21 |
---|