Semi-Supervised Self-Training for Sentence Subjectivity Classification

  1. (PDF, 541 KB)
AuthorSearch for: ; Search for: ; Search for: ; Search for:
ConferenceAI'08 (The 21st Canadian Conference on ArtificialIntelligence), May 28-30, 2008., Windsor, Ontario
AbstractRecent natural language processing (NLP) research shows that identifying and extracting subjective information from texts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selection metrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number50417
NPARC number8913184
Export citationExport as RIS
Report a correctionReport a correction
Record identifier1256764d-560d-42bb-9ffd-5a36578f7804
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: