Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning

Guardat en:
Dades bibliogràfiques
Publicat a:arXiv.org (Dec 22, 2024), p. n/a
Autor principal: Harilal, Nidhin
Altres autors: Rege, Amit Kiran, Reza Akbarian Bafghi, Raissi, Maziar, Monteleoni, Claire
Publicat:
Cornell University Library, arXiv.org
Matèries:
Accés en línia:Citation/Abstract
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3148949001
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3148949001 
045 0 |b d20241222 
100 1 |a Harilal, Nidhin 
245 1 |a Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning 
260 |b Cornell University Library, arXiv.org  |c Dec 22, 2024 
513 |a Working Paper 
520 3 |a Self-supervised learning (SSL) has revolutionized learning from large-scale unlabeled datasets, yet the intrinsic relationship between pretraining data and the learned representations remains poorly understood. Traditional supervised learning benefits from gradient-based data attribution tools like influence functions that measure the contribution of an individual data point to model predictions. However, existing definitions of influence rely on labels, making them unsuitable for SSL settings. We address this gap by introducing Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL. Our method harnesses the stability of learned representations against data augmentations to identify training examples that help explain model predictions. We provide both theoretical foundations and empirical evidence to show the utility of Influence-SSL in analyzing pre-trained SSL models. Our analysis reveals notable differences in how SSL models respond to influential data compared to supervised models. Finally, we validate the effectiveness of Influence-SSL through applications in duplicate detection, outlier identification and fairness analysis. Code is available at: \url{https://github.com/cryptonymous9/Influence-SSL}. 
653 |a Outliers (statistics) 
653 |a Data analysis 
653 |a Data augmentation 
653 |a Self-supervised learning 
653 |a Labels 
653 |a Influence functions 
653 |a Harnesses 
653 |a Representations 
653 |a Data points 
653 |a Stability augmentation 
700 1 |a Rege, Amit Kiran 
700 1 |a Reza Akbarian Bafghi 
700 1 |a Raissi, Maziar 
700 1 |a Monteleoni, Claire 
773 0 |t arXiv.org  |g (Dec 22, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3148949001/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.17170