Feature Substitution Using Latent Dirichlet Allocation for Text Classification

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Journal of Advanced Computer Science and Applications vol. 16, no. 1 (2025)
Hlavní autor:	PDF
Vydáno:	Science and Information (SAI) Organization Limited
Témata:	Accuracy Datasets Markov chains Classification Natural language processing Substitutes Sentiment analysis Words (language) Short message service Documents Text categorization Computer science Data mining Feature selection Efficiency Statistical analysis Machine learning Medical research Semantic analysis Informatics Semantics Markov analysis
On-line přístup:	Citation/Abstract Full Text - PDF
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC


LEADER	00000nab a2200000uu 4500
001	3168740433
003	UK-CbPIL
022			\|a 2158-107X
022			\|a 2156-5570
024	7		\|a 10.14569/IJACSA.2025.01601105 \|2 doi
035			\|a 3168740433
045	2		\|b d20250101 \|b d20251231
100	1		\|a PDF
245	1		\|a Feature Substitution Using Latent Dirichlet Allocation for Text Classification
260			\|b Science and Information (SAI) Organization Limited \|c 2025
513			\|a Journal Article
520	3		\|a Text classification plays a pivotal role in natural language processing, enabling applications such as product categorization, sentiment analysis, spam detection, and document organization. Traditional methods, including bag-of-words and TF-IDF, often lead to high-dimensional feature spaces, increasing computational complexity and susceptibility to overfitting. This study introduces a novel Feature Substitution technique using Latent Dirichlet Allocation (FS-LDA), which enhances text representation by replacing non-overlapping high-probability topic words. FS-LDA effectively reduces dimensionality while retaining essential semantic features, optimizing classification accuracy and efficiency. Experimental evaluations on five e-commerce datasets and an SMS spam dataset demonstrated that FS-LDA, combined with Hidden Markov Models (HMMs), achieved up to 95% classification accuracy in binary tasks and significant improvements in macro and weighted F1-scores for multiclass tasks. The innovative approach lies in FS-LDA's ability to seamlessly integrate dimensionality reduction with feature substitution, while its predictive advantage is demonstrated through consistent performance enhancement across diverse datasets. Future work will explore its application to other classification models and domains, such as social media analysis and medical document categorization, to further validate its scalability and robustness.
653			\|a Accuracy
653			\|a Datasets
653			\|a Markov chains
653			\|a Classification
653			\|a Natural language processing
653			\|a Substitutes
653			\|a Sentiment analysis
653			\|a Words (language)
653			\|a Short message service
653			\|a Documents
653			\|a Text categorization
653			\|a Computer science
653			\|a Data mining
653			\|a Feature selection
653			\|a Efficiency
653			\|a Statistical analysis
653			\|a Machine learning
653			\|a Medical research
653			\|a Semantic analysis
653			\|a Informatics
653			\|a Semantics
653			\|a Markov analysis
773	0		\|t International Journal of Advanced Computer Science and Applications \|g vol. 16, no. 1 (2025)
786	0		\|d ProQuest \|t Advanced Technologies & Aerospace Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3168740433/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3168740433/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch