AnomalyControl: Learning Cross-modal Semantic Features for Controllable Anomaly Synthesis

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org (Dec 10, 2024), p. n/a
Hlavní autor: He, Shidan
Další autoři: Liu, Lei, Zhao, Shen
Vydáno:
Cornell University Library, arXiv.org
Témata:
On-line přístup:Citation/Abstract
Full text outside of ProQuest
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3143054423
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3143054423 
045 0 |b d20241210 
100 1 |a He, Shidan 
245 1 |a AnomalyControl: Learning Cross-modal Semantic Features for Controllable Anomaly Synthesis 
260 |b Cornell University Library, arXiv.org  |c Dec 10, 2024 
513 |a Working Paper 
520 3 |a Anomaly synthesis is a crucial approach to augment abnormal data for advancing anomaly inspection. Based on the knowledge from the large-scale pre-training, existing text-to-image anomaly synthesis methods predominantly focus on textual information or coarse-aligned visual features to guide the entire generation process. However, these methods often lack sufficient descriptors to capture the complicated characteristics of realistic anomalies (e.g., the fine-grained visual pattern of anomalies), limiting the realism and generalization of the generation process. To this end, we propose a novel anomaly synthesis framework called AnomalyControl to learn cross-modal semantic features as guidance signals, which could encode the generalized anomaly cues from text-image reference prompts and improve the realism of synthesized abnormal samples. Specifically, AnomalyControl adopts a flexible and non-matching prompt pair (i.e., a text-image reference prompt and a targeted text prompt), where a Cross-modal Semantic Modeling (CSM) module is designed to extract cross-modal semantic features from the textual and visual descriptors. Then, an Anomaly-Semantic Enhanced Attention (ASEA) mechanism is formulated to allow CSM to focus on the specific visual patterns of the anomaly, thus enhancing the realism and contextual relevance of the generated anomaly features. Treating cross-modal semantic features as the prior, a Semantic Guided Adapter (SGA) is designed to encode effective guidance signals for the adequate and controllable synthesis process. Extensive experiments indicate that AnomalyControl can achieve state-of-the-art results in anomaly synthesis compared with existing methods while exhibiting superior performance for downstream tasks. 
653 |a Feature extraction 
653 |a Semantics 
653 |a Controllability 
653 |a Images 
653 |a Signal generation 
653 |a Anomalies 
653 |a Realism 
653 |a Synthesis 
700 1 |a Liu, Lei 
700 1 |a Zhao, Shen 
773 0 |t arXiv.org  |g (Dec 10, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3143054423/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.06510