FeatNavigator: Automatic Feature Augmentation on Tabular Data

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org (Jun 13, 2024), p. n/a
Hlavní autor: Liang, Jiaming
Další autoři: Lei, Chuan, Xiao, Qin, Zhang, Jiani, Katsifodimos, Asterios, Faloutsos, Christos, Rangwala, Huzefa
Vydáno:
Cornell University Library, arXiv.org
Témata:
On-line přístup:Citation/Abstract
Full text outside of ProQuest
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3068910485
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3068910485 
045 0 |b d20240613 
100 1 |a Liang, Jiaming 
245 1 |a FeatNavigator: Automatic Feature Augmentation on Tabular Data 
260 |b Cornell University Library, arXiv.org  |c Jun 13, 2024 
513 |a Working Paper 
520 3 |a Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality features in relational tables for ML models. FeatNavigator evaluates a feature from two aspects: (1) the intrinsic value of a feature towards an ML task (i.e., feature importance) and (2) the efficacy of a join path connecting the feature to the base table (i.e., integration quality). FeatNavigator strategically selects a small set of available features and their corresponding join paths to train a feature importance estimation model and an integration quality prediction model. Furthermore, FeatNavigator's search algorithm exploits both estimated feature importance and integration quality to identify the optimized feature augmentation plan. Our experimental results show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance. 
653 |a Search algorithms 
653 |a Machine learning 
653 |a Prediction models 
700 1 |a Lei, Chuan 
700 1 |a Xiao, Qin 
700 1 |a Zhang, Jiani 
700 1 |a Katsifodimos, Asterios 
700 1 |a Faloutsos, Christos 
700 1 |a Rangwala, Huzefa 
773 0 |t arXiv.org  |g (Jun 13, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3068910485/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2406.09534