Relevance popularity: A term event model based feature selection scheme for text classification

Guardado en:
Bibliografiske detaljer
Udgivet i:PLoS One vol. 12, no. 4 (Apr 2017), p. e0174341
Hovedforfatter: Feng, Guozhong
Andre forfattere: An, Baiguo, Yang, Fengqin, Wang, Han, Zhang, Libiao
Udgivet:
Public Library of Science
Fag:
Online adgang:Citation/Abstract
Full Text
Full Text - PDF
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 1884473897
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0174341  |2 doi 
035 |a 1884473897 
045 2 |b d20170401  |b d20170430 
084 |a 174835  |2 nlm 
100 1 |a Feng, Guozhong 
245 1 |a Relevance popularity: A term event model based feature selection scheme for text classification 
260 |b Public Library of Science  |c Apr 2017 
513 |a Journal Article 
520 3 |a Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods. 
651 4 |a China 
653 |a Text categorization 
653 |a Word sense disambiguation 
653 |a Bayesian analysis 
653 |a Computer science 
653 |a Information storage 
653 |a Bioinformatics 
653 |a Studies 
653 |a Classification 
653 |a Neural networks 
653 |a Relevance 
653 |a Economic 
653 |a Information processing 
653 |a Methods 
653 |a Algorithms 
653 |a Text editing 
653 |a Information theory 
653 |a Information technology 
653 |a Information retrieval 
653 |a Numerical experiments 
653 |a Artificial intelligence 
653 |a Laboratories 
653 |a Statistical analysis 
653 |a Probabilistic models 
653 |a Datasets 
653 |a Support vector machines 
653 |a Documents 
700 1 |a An, Baiguo 
700 1 |a Yang, Fengqin 
700 1 |a Wang, Han 
700 1 |a Zhang, Libiao 
773 0 |t PLoS One  |g vol. 12, no. 4 (Apr 2017), p. e0174341 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/1884473897/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/1884473897/fulltext/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/1884473897/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch