Text categorization for an online tendering system

  1. (PDF, 268 KB)
AuthorSearch for: ; Search for: ; Search for: ; Search for:
Proceedings titleProceedings of the Business Agents and Semantic Web Workshop (BASeWEB 04)
ConferenceBusiness Agents and Semantic Web Workshop (BASeWEB 04), May 2004, London, Canada
Subjecttext categorization; machine learning; Rocchio method; TF-IDF; WIDF; weighted inverse document frequency; naive Bayes classifier; ranking categorization
AbstractThis paper investigates the application of text categoriza- tion (TC) in a setting exhibiting a large number of target categories with relatively few training cases, applied to a real-life online tendering system. This is an experiment paper showing our experiences in dealing with a real- life application using the conventional machine learning approaches for TC, namely, the Rocchio method, TF-IDF (term frequency-inverse document fre- quency), WIDF (weighted inverse document frequency), and naijve Bayes. In order to make the categorization results acceptable for industrial use, we made use of the hierarchical structure of the target categories and investi- gated the semi-automated ranking categorization.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedYes
NPARC number21260516
Export citationExport as RIS
Report a correctionReport a correction
Record identifiera3b8b396-a184-43b9-b229-d47c4e95ed5c
Record created2013-03-05
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: