Relevance popularity: A term event model based feature selection scheme for text classification
Guardado en:
| Udgivet i: | PLoS One vol. 12, no. 4 (Apr 2017), p. e0174341 |
|---|---|
| Hovedforfatter: | |
| Andre forfattere: | , , , |
| Udgivet: |
Public Library of Science
|
| Fag: | |
| Online adgang: | Citation/Abstract Full Text Full Text - PDF |
| Tags: |
Ingen Tags, Vær først til at tagge denne postø!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 1884473897 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 1932-6203 | ||
| 024 | 7 | |a 10.1371/journal.pone.0174341 |2 doi | |
| 035 | |a 1884473897 | ||
| 045 | 2 | |b d20170401 |b d20170430 | |
| 084 | |a 174835 |2 nlm | ||
| 100 | 1 | |a Feng, Guozhong | |
| 245 | 1 | |a Relevance popularity: A term event model based feature selection scheme for text classification | |
| 260 | |b Public Library of Science |c Apr 2017 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods. | |
| 651 | 4 | |a China | |
| 653 | |a Text categorization | ||
| 653 | |a Word sense disambiguation | ||
| 653 | |a Bayesian analysis | ||
| 653 | |a Computer science | ||
| 653 | |a Information storage | ||
| 653 | |a Bioinformatics | ||
| 653 | |a Studies | ||
| 653 | |a Classification | ||
| 653 | |a Neural networks | ||
| 653 | |a Relevance | ||
| 653 | |a Economic | ||
| 653 | |a Information processing | ||
| 653 | |a Methods | ||
| 653 | |a Algorithms | ||
| 653 | |a Text editing | ||
| 653 | |a Information theory | ||
| 653 | |a Information technology | ||
| 653 | |a Information retrieval | ||
| 653 | |a Numerical experiments | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Laboratories | ||
| 653 | |a Statistical analysis | ||
| 653 | |a Probabilistic models | ||
| 653 | |a Datasets | ||
| 653 | |a Support vector machines | ||
| 653 | |a Documents | ||
| 700 | 1 | |a An, Baiguo | |
| 700 | 1 | |a Yang, Fengqin | |
| 700 | 1 | |a Wang, Han | |
| 700 | 1 | |a Zhang, Libiao | |
| 773 | 0 | |t PLoS One |g vol. 12, no. 4 (Apr 2017), p. e0174341 | |
| 786 | 0 | |d ProQuest |t Health & Medical Collection | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/1884473897/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/1884473897/fulltext/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/1884473897/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch |