Infrared-Visible Image Fusion Meets Object Detection: Towards Unified Optimization for Multimodal Perception

Kaydedildi:
Detaylı Bibliyografya
Yayımlandı:Remote Sensing vol. 17, no. 21 (2025), p. 3637-3663
Yazar: Xiantai, Xiang
Diğer Yazarlar: Zhou Guangyao, Niu, Ben, Pan Zongxu, Huang, Lijia, Li Wenshuai, Wen Zixiao, Qi Jiamin, Gao Wanxin
Baskı/Yayın Bilgisi:
MDPI AG
Konular:
Online Erişim:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiketler: Etiketle
Etiket eklenmemiş, İlk siz ekleyin!

MARC

LEADER 00000nab a2200000uu 4500
001 3271544926
003 UK-CbPIL
022 |a 2072-4292 
024 7 |a 10.3390/rs17213637  |2 doi 
035 |a 3271544926 
045 2 |b d20250101  |b d20251231 
084 |a 231556  |2 nlm 
100 1 |a Xiantai, Xiang  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
245 1 |a Infrared-Visible Image Fusion Meets Object Detection: Towards Unified Optimization for Multimodal Perception 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a <sec sec-type="highlights"> What are the main findings?<list list-type="bullet"><list-item></list-item>Our proposed UniFusOD method integrates infrared-visible image fusion and object detection into a unified, end-to-end framework, achieving superior performance across multiple tasks.<list-item>The introduction of the Fine-Grained Region Attention (FRA) module and UnityGrad optimization significantly enhances the model’s ability to handle multi-scale features and resolves gradient conflicts, improving both fusion and detection outcomes.</list-item> What are the implications of the main findings?<list list-type="bullet"><list-item></list-item>The unified optimization approach not only improves image fusion quality but also enhances downstream task performance, particularly in detecting rotated and small objects.<list-item>This approach demonstrates significant robustness across various datasets, offering a promising solution for multimodal perception tasks in remote sensing and autonomous driving.</list-item> Infrared-visible image fusion and object detection are crucial components in remote sensing applications, each offering unique advantages. Recent research has increasingly sought to combine these tasks to enhance object detection performance. However, the integration of these tasks presents several challenges, primarily due to two overlooked issues: (i) existing infrared-visible image fusion methods often fail to adequately focus on fine-grained or dense information, and (ii) while joint optimization methods can improve fusion quality and downstream task performance, their multi-stage training processes often reduce efficiency and limit the network’s global optimization capability. To address these challenges, we propose the UniFusOD method, an efficient end-to-end framework that simultaneously optimizes both infrared-visible image fusion and object detection tasks. The method integrates Fine-Grained Region Attention (FRA) for region-specific attention operations at different granularities, enhancing the model’s ability to capture complex information. Furthermore, UnityGrad is introduced to balance the gradient conflicts between fusion and detection tasks, stabilizing the optimization process. Extensive experiments demonstrate the superiority and robustness of our approach. Not only does UniFusOD achieve excellent results in image fusion, but it also provides significant improvements in object detection performance. The method exhibits remarkable robustness across various tasks, achieving a 0.8 and 1.9 mAP50 improvement over state-of-the-art methods on the DroneVehicle dataset for rotated object detection and the M3FD dataset for horizontal object detection, respectively. 
653 |a Remote sensing 
653 |a Datasets 
653 |a Deep learning 
653 |a Global optimization 
653 |a Optimization 
653 |a Attention 
653 |a Perception 
653 |a Infrared imagery 
653 |a Remote sensing systems 
653 |a Computer vision 
653 |a Image quality 
653 |a Robustness (mathematics) 
653 |a Object recognition 
653 |a Visual perception 
653 |a Multisensor fusion 
653 |a Semantics 
700 1 |a Zhou Guangyao  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Niu, Ben  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Pan Zongxu  |u School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China; panzx@xjtu.edu.cn 
700 1 |a Huang, Lijia  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Li Wenshuai  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Wen Zixiao  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Qi Jiamin  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; xiangxiantai23@mails.ucas.ac.cn (X.X.); wenzixiao22@mails.ucas.ac.cn (Z.W.); 
700 1 |a Gao Wanxin  |u School of Automation, Beijing Institute of Technology, Beijing 100081, China 
773 0 |t Remote Sensing  |g vol. 17, no. 21 (2025), p. 3637-3663 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3271544926/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3271544926/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3271544926/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch