Experiments in discriminating similar languages

  1. (PDF, 316 KB)
AuthorSearch for: ; Search for:
Proceedings titleProceedings of LT4VarDial
ConferenceLT4VarDial - Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, September 10th, 2015, Hissar, Bulgaria
AbstractWe describe the system built by the National Research Council (NRC) Canada for the 2015 shared task on Discriminating between similar languages. The NRC system uses various statistical classifiers trained on character and word ngram features. Predictions rely on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. This year, we focused on two issues: 1) the ngram generation process, and 2) the handling of the anonymized (“blinded”) Named Entities. Despite the slightly harder experimental conditions this year, our systems achieved an average accuracy of 95.24% (closed task) and 95.65% (open task), ending up second or (close) third on the closed task, and first on the open task.
Publication date
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number21276326
Export citationExport as RIS
Report a correctionReport a correction
Record identifier884cac9b-7d70-4078-9542-3f5980852d99
Record created2015-10-02
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: