Detection of Multiple Influential Observations on Model Selection

Salvato in:
Dettagli Bibliografici
Pubblicato in:arXiv.org (Dec 4, 2024), p. n/a
Autore principale: Zhang, Dongliang
Altri autori: Asgharian, Masoud, Lindquist, Martin A
Pubblicazione:
Cornell University Library, arXiv.org
Soggetti:
Accesso online:Citation/Abstract
Full text outside of ProQuest
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3141256594
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3141256594 
045 0 |b d20241204 
100 1 |a Zhang, Dongliang 
245 1 |a Detection of Multiple Influential Observations on Model Selection 
260 |b Cornell University Library, arXiv.org  |c Dec 4, 2024 
513 |a Working Paper 
520 3 |a Outlying observations are frequently encountered in a wide spectrum of scientific domains, posing significant challenges for the generalizability of statistical models and the reproducibility of downstream analysis. These observations can be identified through influential diagnosis, which refers to the detection of observations that are unduly influential on diverse facets of statistical inference. To date, methods for identifying observations influencing the choice of a stochastically selected submodel have been underdeveloped, especially in the high-dimensional setting where the number of predictors p exceeds the sample size n. Recently we proposed an improved diagnostic measure to handle this setting. However, its distributional properties and approximations have not yet been explored. To address this shortcoming, the notion of exchangeability is revived, and used to determine the exact finite- and large-sample distributions of our assessment metric. This forms the foundation for the introduction of both parametric and non-parametric approaches for its approximation and the establishment of thresholds for diagnosis. The resulting framework is extended to logistic regression models, followed by a simulation study conducted to assess the performance of various detection procedures. Finally the framework is applied to data from an fMRI study of thermal pain, with the goal of identifying outlying subjects that could distort the formulation of statistical models using functional brain activity in predicting physical pain ratings. Both linear and logistic regression models are used to demonstrate the benefits of detection and compare the performances of different detection procedures. In particular, two additional influential observations are identified, which are not discovered by previous studies. 
653 |a Statistical methods 
653 |a Diagnosis 
653 |a Regression analysis 
653 |a Pain 
653 |a Thermal simulation 
653 |a Statistical analysis 
653 |a Statistical models 
653 |a Regression models 
653 |a Statistical inference 
653 |a Approximation 
700 1 |a Asgharian, Masoud 
700 1 |a Lindquist, Martin A 
773 0 |t arXiv.org  |g (Dec 4, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3141256594/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.02945