Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning

保存先:

書誌詳細
出版年:	arXiv.org (Dec 22, 2024), p. n/a
第一著者:	Harilal, Nidhin
その他の著者:	Rege, Amit Kiran, Reza Akbarian Bafghi, Raissi, Maziar, Monteleoni, Claire
出版事項:	Cornell University Library, arXiv.org
主題:	Outliers (statistics) Data analysis Data augmentation Self-supervised learning Labels Influence functions Harnesses Representations Data points Stability augmentation
オンライン･アクセス:	Citation/Abstract Full text outside of ProQuest
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

その他の書誌記述
抄録:	Self-supervised learning (SSL) has revolutionized learning from large-scale unlabeled datasets, yet the intrinsic relationship between pretraining data and the learned representations remains poorly understood. Traditional supervised learning benefits from gradient-based data attribution tools like influence functions that measure the contribution of an individual data point to model predictions. However, existing definitions of influence rely on labels, making them unsuitable for SSL settings. We address this gap by introducing Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL. Our method harnesses the stability of learned representations against data augmentations to identify training examples that help explain model predictions. We provide both theoretical foundations and empirical evidence to show the utility of Influence-SSL in analyzing pre-trained SSL models. Our analysis reveals notable differences in how SSL models respond to influential data compared to supervised models. Finally, we validate the effectiveness of Influence-SSL through applications in duplicate detection, outlier identification and fairness analysis. Code is available at: \url{https://github.com/cryptonymous9/Influence-SSL}.
ISSN:	2331-8422
ソース:	Engineering Database