The NRC System for Discriminating Similar Languages

  1. (PDF, 315 KB)
AuthorSearch for: ; Search for: ; Search for:
Proceedings titleProceedings of the First Workshop on Applying NLP Tools to Similar Languages
ConferenceFirst Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, August 23-29, 2014, Dublin, Ireland
Pages139145; # of pages: 7
AbstractWe describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classifier with 99.99% accuracy on the five target groups. Within each group (except English), we use a voting combination of discriminative classifiers trained on a variety of feature spaces, achieving an average accuracy of 95.71%, with per-group accuracy between 90.95% and 100% depending on the group. This approach turns out to reach the best performance among all systems submitted to the open and closed tasks.
Publication date
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedNo
NPARC number21275282
Export citationExport as RIS
Report a correctionReport a correction
Record identifierbd4a662e-ed67-47ef-8165-abde04de494c
Record created2015-05-28
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: