Lexically-triggered hidden Markov models for clinical document coding

  1. (PDF, 314 KB)
AuthorSearch for: ; Search for:
Proceedings titleProceedings of the 49th Annual Meeting of the Association for Computational Linguistics
ConferenceACL HLT 2011 : The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 19-24, 2011, Portland, Oregon, USA
Pages742751; # of pages: 10
AbstractThe automatic coding of clinical documents is an important task for today’s healthcare providers. Though it can be viewed as multi-label document classification, the coding problem has the interesting property that most code assignments can be supported by a single phrase found in the input document. We propose a Lexically-Triggered Hidden Markov Model (LT-HMM) that leverages these phrases to improve coding accuracy. The LT-HMM works in two stages: first, a lexical match is performed against a term dictionary to collect a set of candidate codes for a document. Next, a discriminative HMM selects the best subset of codes to assign to the document by tagging candidates as present or absent. By confirming codes proposed by a dictionary, the LT-HMM can share features across codes, enabling strong performance even on rare codes. In fact, we are able to recover codes that do not occur in the training set at all. Our approach achieves the best ever performance on the 2007 Medical NLP Challenge test set, with an F-measure of 89.84.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NPARC number18533387
Export citationExport as RIS
Report a correctionReport a correction
Record identifierc6ed239f-7c68-4c0e-bf11-a6a221afeeb4
Record created2011-09-07
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: