Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools

Wedi'i Gadw mewn:
Manylion Llyfryddiaeth
Cyhoeddwyd yn:arXiv.org (Jul 26, 2022), p. n/a
Prif Awdur: Brown, Davis
Awduron Eraill: Kvinge, Henry
Cyhoeddwyd:
Cornell University Library, arXiv.org
Pynciau:
Mynediad Ar-lein:Citation/Abstract
Full text outside of ProQuest
Tagiau: Ychwanegu Tag
Dim Tagiau, Byddwch y cyntaf i dagio'r cofnod hwn!

MARC

LEADER 00000nab a2200000uu 4500
001 2582280395
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2582280395 
045 0 |b d20220726 
100 1 |a Brown, Davis 
245 1 |a Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools 
260 |b Cornell University Library, arXiv.org  |c Jul 26, 2022 
513 |a Working Paper 
520 3 |a Methods for model explainability have become increasingly critical for testing the fairness and soundness of deep learning. Concept-based interpretability techniques, which use a small set of human-interpretable concept exemplars in order to measure the influence of a concept on a model's internal representation of input, are an important thread in this line of research. In this work we show that these explainability methods can suffer the same vulnerability to adversarial attacks as the models they are meant to analyze. We demonstrate this phenomenon on two well-known concept-based interpretability methods: TCAV and faceted feature visualization. We show that by carefully perturbing the examples of the concept that is being investigated, we can radically change the output of the interpretability method. The attacks that we propose can either induce positive interpretations (polka dots are an important concept for a model when classifying zebras) or negative interpretations (stripes are not an important factor in identifying images of a zebra). Our work highlights the fact that in safety-critical applications, there is need for security around not only the machine learning pipeline but also the model interpretation process. 
653 |a Deep learning 
653 |a Machine learning 
653 |a Safety critical 
700 1 |a Kvinge, Henry 
773 0 |t arXiv.org  |g (Jul 26, 2022), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2582280395/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2110.07120