IncomeSCM: From tabular data set to time-series simulator and causal estimation benchmark

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Oct 28, 2024), p. n/a
Autor principal: Johansson, Fredrik D
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3063930225
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3063930225 
045 0 |b d20241028 
100 1 |a Johansson, Fredrik D 
245 1 |a IncomeSCM: From tabular data set to time-series simulator and causal estimation benchmark 
260 |b Cornell University Library, arXiv.org  |c Oct 28, 2024 
513 |a Working Paper 
520 3 |a Evaluating observational estimators of causal effects demands information that is rarely available: unconfounded interventions and outcomes from the population of interest, created either by randomization or adjustment. As a result, it is customary to fall back on simulators when creating benchmark tasks. Simulators offer great control but are often too simplistic to make challenging tasks, either because they are hand-designed and lack the nuances of real-world data, or because they are fit to observational data without structural constraints. In this work, we propose a general, repeatable strategy for turning observational data into sequential structural causal models and challenging estimation tasks by following two simple principles: 1) fitting real-world data where possible, and 2) creating complexity by composing simple, hand-designed mechanisms. We implement these ideas in a highly configurable software package and apply it to the well-known Adult income data set to construct the IncomeSCM simulator. From this, we devise multiple estimation tasks and sample data sets to compare established estimators of causal effects. The tasks present a suitable challenge, with effect estimates varying greatly in quality between methods, despite similar performance in the modeling of factual outcomes, highlighting the need for dedicated causal estimators and model selection criteria. 
653 |a Simulators 
653 |a Datasets 
653 |a Estimators 
653 |a Configurable programs 
653 |a Benchmarks 
773 0 |t arXiv.org  |g (Oct 28, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3063930225/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2405.16069