A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

Αποθηκεύτηκε σε:

Λεπτομέρειες βιβλιογραφικής εγγραφής
Εκδόθηκε σε:	arXiv.org (Feb 27, 2024), p. n/a
Κύριος συγγραφέας:	Chen, Zehui
Άλλοι συγγραφείς:	Wang, Qiuchen, Li, Zhenyu, Liu, Jiaming, Zhang, Shanghang, Zhao, Feng
Έκδοση:	Cornell University Library, arXiv.org
Θέματα:	Visual tasks Instance segmentation Robustness (mathematics) Visual perception Learning Object recognition Visual perception driven algorithms
Διαθέσιμο Online:	Citation/Abstract Full text outside of ProQuest
Ετικέτες:	Προσθήκη ετικέτας Δεν υπάρχουν, Καταχωρήστε ετικέτα πρώτοι!

Περιγραφή
Περίληψη:	In this report, we present our solution to the multi-task robustness track of the 1st Visual Continual Learning (VCL) Challenge at ICCV 2023 Workshop. We propose a vanilla framework named UniNet that seamlessly combines various visual perception algorithms into a multi-task model. Specifically, we choose DETR3D, Mask2Former, and BinsFormer for 3D object detection, instance segmentation, and depth estimation tasks, respectively. The final submission is a single model with InternImage-L backbone, and achieves a 49.6 overall score (29.5 Det mAP, 80.3 mTPS, 46.4 Seg mAP, and 7.93 silog) on SHIFT validation set. Besides, we provide some interesting observations in our experiments which may facilitate the development of multi-task learning in dense visual prediction.
ISSN:	2331-8422
Πηγή:	Engineering Database