Semi-supervised Document Classification with a Mislabeling Error Model

Par Conseil national de recherches du Canada

Téléchargement	Voir le manuscrit accepté : Semi-supervised Document Classification with a Mislabeling Error Model (PDF, 647 Kio)
Auteur	Rechercher : Krithara, Anastasia; Rechercher : Amini, Massih R.; Rechercher : Renders, Jean-Michel; Rechercher : Goutte, Cyril¹
Affiliation	Conseil national de recherches du Canada. Institut de technologie de l'information du CNRC
Format	Texte, Article
Conférence	Advances in Information Retrieval, 30th European Conference on IR Research (ECIR'08), Glasgow, UK, March 30 - April 03, 2008
Résumé	This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the next round. Our approach outperforms an earlier semi-supervised extension of PLSA introduced by [9] which is based on the use of fake labels. However, it maintains its simplicity and ability to solve multiclass problems. In ad- dition, it gives valuable information about the most uncertain and difficult classes to label. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections and show the effectiveness of our approach over two other semi-supervised algorithms applied to these text classification problems.
Date de publication	2008
Dans	Proceedings. The 30th European Conference on Information Retrieval (ECIR 2008) (2008).
Langue	anglais
Publications évaluées par des pairs	Oui
Numéro du CNRC	NRCC 50728
Numéro NPARC	16435926
Exporter la notice	Exporter en format RIS
Signaler une correction	Signaler une correction (s'ouvre dans un nouvel onglet)
Identificateur de l’enregistrement	96fa3c52-f816-42de-8099-fd7df5fe6de5
Enregistrement créé	2010-11-25
Enregistrement modifié	2020-04-15

Date de modification :: 2024-04-18