Moving towards more holistic machine learning-based approaches for classification problems in animal studies

保存先:

書誌詳細
出版年:	bioRxiv (Jan 27, 2025)
第一著者:	Christensen, Charlotte
その他の著者:	Ferreira, Andre, Wismer Cherono, Maximiadi, Maria, Nyaguthii, Brendah, Ogino, Mina, Herrera, Daniel, Farine, Damien
出版事項:	Cold Spring Harbor Laboratory Press
主題:	Machine learning Hypothesis testing Animal models Hypotheses Classification Learning algorithms Business metrics
オンライン･アクセス:	Citation/Abstract Full Text - PDF Full text outside of ProQuest
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

MARC


LEADER	00000nab a2200000uu 4500
001	3160209667
003	UK-CbPIL
022			\|a 2692-8205
024	7		\|a 10.1101/2024.10.18.618969 \|2 doi
035			\|a 3160209667
045	0		\|b d20250127
100	1		\|a Christensen, Charlotte
245	1		\|a Moving towards more holistic machine learning-based approaches for classification problems in animal studies
260			\|b Cold Spring Harbor Laboratory Press \|c Jan 27, 2025
513			\|a Working Paper
520	3		\|a Machine-learning (ML) is revolutionizing field and laboratory studies of animals. However, a challenge when deploying ML for classification tasks is ensuring the models are reliable. Currently, we evaluate models using performance metrics (e.g., precision, recall, F1), but these can overlook the ultimate aim, which is not the outputs themselves (e.g. detected species or individual identities, or behaviour) but their incorporation for hypothesis testing. As improving performance metrics has diminishing returns, particularly when data are inherently noisy (as human-labelled, animal-based data often are), researchers are faced with the conundrum of investing more time in maximising metrics versus doing the actual research. This raises the question: how much noise can we accept in ML models? Here, we start by describing an under-reported factor that can cause metrics to underestimate model performance. Specifically, ambiguity between categories or mistakes in labelling validation data produces hard ceilings that limit performance metrics. This likely widespread issue means that many models could be performing better than their metrics suggest. Next, we argue and show that imperfect models (e.g. low F1 scores) can still be useable. Using a case study on ML-identified behaviour from vulturine guineafowl accelerometer data, we first propose a simulation framework to evaluate robustness of hypothesis testing using models that make classification errors. Second, we show how to determine the utility of a model by supplementing existing performance metrics with 'biological validations' This involves applying ML models to unlabelled data and using the models' outputs to test hypotheses for which we can anticipate the outcome. Together, we show that effects sizes and expected biological patterns can be detected even when performance metrics are relatively low (e.g., F1: 60-70%). In doing so, we provide a roadmap for validation approaches of ML classification models tailored to research in animal behaviour, and other fields with noisy, biological data.Competing Interest StatementThe authors have declared no competing interest.Footnotes* Revision defines the scope of the paper more clearly (using machine-learning for the classification of raw data to be used in posterior hypothesis testing). Revision entails additional methodological details (Ethical note, Figure S2 to show alignment of accelerometer data with labels).
653			\|a Machine learning
653			\|a Hypothesis testing
653			\|a Animal models
653			\|a Hypotheses
653			\|a Classification
653			\|a Learning algorithms
653			\|a Business metrics
700	1		\|a Ferreira, Andre
700	1		\|a Wismer Cherono
700	1		\|a Maximiadi, Maria
700	1		\|a Nyaguthii, Brendah
700	1		\|a Ogino, Mina
700	1		\|a Herrera, Daniel
700	1		\|a Farine, Damien
773	0		\|t bioRxiv \|g (Jan 27, 2025)
786	0		\|d ProQuest \|t Biological Science Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3160209667/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3160209667/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u https://www.biorxiv.org/content/10.1101/2024.10.18.618969v2