Relevance popularity: A term event model based feature selection scheme for text classification
সংরক্ষণ করুন:
| প্রকাশিত: | PLoS One vol. 12, no. 4 (Apr 2017), p. e0174341 |
|---|---|
| প্রধান লেখক: | |
| অন্যান্য লেখক: | , , , |
| প্রকাশিত: |
Public Library of Science
|
| বিষয়গুলি: | |
| অনলাইন ব্যবহার করুন: | Citation/Abstract Full Text Full Text - PDF |
| ট্যাগগুলো: |
কোনো ট্যাগ নেই, প্রথমজন হিসাবে ট্যাগ করুন!
|
| সার সংক্ষেপ: | Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods. |
|---|---|
| আইএসএসএন: | 1932-6203 |
| ডিওআই: | 10.1371/journal.pone.0174341 |
| সম্পদ: | Health & Medical Collection |