Knowledge Extraction in Population Health Datasets: An Exploratory Data Mining Approach

I tiakina i:
Ngā taipitopito rārangi puna kōrero
I whakaputaina i:PQDT - Global (2018)
Kaituhi matua: Khangamwa, Gift
I whakaputaina:
ProQuest Dissertations & Theses
Ngā marau:
Urunga tuihono:Citation/Abstract
Full Text - PDF
Ngā Tūtohu: Tāpirihia he Tūtohu
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!

MARC

LEADER 00000nab a2200000uu 4500
001 3159789852
003 UK-CbPIL
020 |a 9798302306265 
035 |a 3159789852 
045 2 |b d20180101  |b d20181231 
084 |a 189128  |2 nlm 
100 1 |a Khangamwa, Gift 
245 1 |a Knowledge Extraction in Population Health Datasets: An Exploratory Data Mining Approach 
260 |b ProQuest Dissertations & Theses  |c 2018 
513 |a Dissertation/Thesis 
520 3 |a There is a growing trend in the utilization of machine learning and data mining techniques for knowledge extraction in health datasets. In this study, we used machine learning methods for data exploration and model building and we built classifier models for anemia. Anemia is recognized as a crucial public health challenge that leads to poor health for mothers and infants and one of its main causes is malaria. We used a dataset from Malawi where the prevalence of these two health challenges of malaria and anemia remains high. We employed machine learning algorithms for the task of knowledge extraction on these demographic and health datasets for Malawi for the survey years 2004 and 2010. We followed the cross-industry standard process for data mining methodology to guide our study. The dataset was obtained, cleaned and prepared for experimentation. Unsupervised machine learning methods were used to understand the nature of the dataset and the natural groupings in it. On the other hand, supervised machine learning methods were used to build predictive models for anemia. Specifically, we used principal component analysis and clustering algorithms in our unsupervised machine learning experiments. Support vector machines and decision trees were used in the supervised machine learning experiments. Unsupervised ML methods revealed that there was no significant separation of clustering according to both malaria and anemia attributes. However, attributes such as age, economic status, health practices attributes and number of children a woman has, were clustered in significantly different ways, i.e., young and old women went to different clusters. Moreover, PCA results confirmed these findings. Supervised methods, on the other hand, revealed that anemia classifiers could be developed using SVM and DTs for the dataset. The best performing models attained accuracy of 86%, ROC area score of 86%, mean absolute error of 0.27, and kappa of 0.78, which was built using an SVM model having C= 100, γ = 10−18. On the other hand, DTs produced the best model having accuracy 73%, ROC area score 74%, mean absolute error 0.36 and Kappa statistic of 0.449. In conclusion, we successfully built a good anemia classifier using SVM and also showed the relationship between important attributes in the classification of anemia. 
653 |a Computer science 
653 |a Artificial intelligence 
773 0 |t PQDT - Global  |g (2018) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3159789852/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3159789852/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch