Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:arXiv.org (Mar 29, 2024), p. n/a
מחבר ראשי: Zeng, Runhao
מחברים אחרים: Chen, Xiaoyong, Liang, Jiaming, Wu, Huisi, Cao, Guangzhong, Guo, Yong
יצא לאור:
Cornell University Library, arXiv.org
נושאים:
גישה מקוונת:Citation/Abstract
Full text outside of ProQuest
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 3028031898
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3028031898 
045 0 |b d20240329 
100 1 |a Zeng, Runhao 
245 1 |a Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions 
260 |b Cornell University Library, arXiv.org  |c Mar 29, 2024 
513 |a Working Paper 
520 3 |a Temporal action detection (TAD) aims to locate action positions and recognize action categories in long-term untrimmed videos. Although many methods have achieved promising results, their robustness has not been thoroughly studied. In practice, we observe that temporal information in videos can be occasionally corrupted, such as missing or blurred frames. Interestingly, existing methods often incur a significant performance drop even if only one frame is affected. To formally evaluate the robustness, we establish two temporal corruption robustness benchmarks, namely THUMOS14-C and ActivityNet-v1.3-C. In this paper, we extensively analyze the robustness of seven leading TAD methods and obtain some interesting findings: 1) Existing methods are particularly vulnerable to temporal corruptions, and end-to-end methods are often more susceptible than those with a pre-trained feature extractor; 2) Vulnerability mainly comes from localization error rather than classification error; 3) When corruptions occur in the middle of an action instance, TAD models tend to yield the largest performance drop. Besides building a benchmark, we further develop a simple but effective robust training method to defend against temporal corruptions, through the FrameDrop augmentation and Temporal-Robust Consistency loss. Remarkably, our approach not only improves robustness but also yields promising improvements on clean data. We believe that this study will serve as a benchmark for future research in robust video analysis. Source code and models are available at https://github.com/Alvin-Zeng/temporal-robustness-benchmark. 
653 |a Feature extraction 
653 |a Source code 
653 |a Video 
653 |a Robustness 
653 |a Benchmarks 
700 1 |a Chen, Xiaoyong 
700 1 |a Liang, Jiaming 
700 1 |a Wu, Huisi 
700 1 |a Cao, Guangzhong 
700 1 |a Guo, Yong 
773 0 |t arXiv.org  |g (Mar 29, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3028031898/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2403.20254