DLiteNet: A Dual-Branch Lightweight Framework for Efficient and Precise Building Extraction from Visible and SAR Imagery

Saved in:
Bibliographic Details
Published in:Remote Sensing vol. 17, no. 24 (2025), p. 3939-3964
Main Author: Zhao, Zhe
Other Authors: Zhao Boya, Du Ruitong, Wu Yuanfeng, Chen Jiaen, Zheng Yuchen
Published:
MDPI AG
Subjects:
Online Access:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000nab a2200000uu 4500
001 3286352446
003 UK-CbPIL
022 |a 2072-4292 
024 7 |a 10.3390/rs17243939  |2 doi 
035 |a 3286352446 
045 2 |b d20250101  |b d20251231 
084 |a 231556  |2 nlm 
100 1 |a Zhao, Zhe  |u Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; zhaozhe22@mails.ucas.ac.cn (Z.Z.); zhaoby@aircas.ac.cn (B.Z.); durt@aircas.ac.cn (R.D.) 
245 1 |a DLiteNet: A Dual-Branch Lightweight Framework for Efficient and Precise Building Extraction from Visible and SAR Imagery 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a <sec sec-type="highlights"> What are the main findings? <list list-type="bullet"> <list-item> </list-item>A dual-branch lightweight multimodal framework (DLiteNet) is proposed. It decouples building extraction into a context branch for global semantics (via STDAC) and a CDAM-guided spatial branch for edges and details, with MCAM adaptively fusing visible–SAR features. Removing the complex decoding stage enables efficient segmentation. <list-item> DLiteNet consistently outperforms state-of-the-art multimodal building-extraction methods on the DFC23 Track2 and MSAW datasets, achieving a strong efficiency–precision trade-off and demonstrating strong potential for real-time on-board deployment. </list-item> What is the implication of the main finding? <list list-type="bullet"> <list-item> </list-item>By removing complex decoding and adopting a dual-branch, task-decoupled design, DLiteNet shows that accurate visible–SAR building extraction is achievable under tight compute/memory budgets, enabling large-area, high-frequency mapping and providing a reusable blueprint for other multimodal segmentation tasks (e.g., roads, damage, change detection). <list-item> Its lightweight yet precise architecture makes real-time on-board deployment on UAVs and other edge platforms practical for city monitoring and rapid disaster response. </list-item> High-precision and efficient building extraction by fusing visible and synthetic aperture radar (SAR) imagery is critical for applications such as smart cities, disaster response, and UAV navigation. However, existing approaches often rely on complex multimodal feature extraction and deep fusion mechanisms, resulting in over-parameterized models and excessive computation, which makes it challenging to balance accuracy and efficiency. To address this issue, we propose a dual-branch lightweight architecture, DLiteNet, which functionally decouples the multimodal building extraction task into two sub-tasks: global context modeling and spatial detail capturing. Accordingly, we design a lightweight context branch and spatial branch to achieve an optimal trade-off between semantic accuracy and computational efficiency. The context branch jointly processes visible and SAR images, leveraging our proposed Multi-scale Context Attention Module (MCAM) to adaptively fuse multimodal contextual information, followed by a lightweight Short-Term Dense Atrous Concatenate (STDAC) module for extracting high-level semantics. The spatial branch focuses on capturing textures and edge structures from visible imagery and employs a Context-Detail Aggregation Module (CDAM) to fuse contextual priors and refine building contours. Experiments on the MSAW and DFC23 Track2 datasets demonstrate that DLiteNet achieves strong performance with only 5.6 M parameters and extremely low computational costs (51.7/5.8 GFLOPs), significantly outperforming state-of-the-art models such as CMGFNet (85.2 M, 490.9/150.3 GFLOPs) and MCANet (71.2 M, 874.5/375.9 GFLOPs). On the MSAW dataset, DLiteNet achieves the highest accuracy (83.6% IoU, 91.1% F1-score), exceeding the best MCANet baseline by 1.0% IoU and 0.6% F1-score. Furthermore, deployment tests on the Jetson Orin NX edge device show that DLiteNet achieves a low inference latency of 14.97 ms per frame under FP32 precision, highlighting its real-time capability and deployment potential in edge computing scenarios. 
653 |a Accuracy 
653 |a Computer architecture 
653 |a Decoding 
653 |a Buildings 
653 |a Segmentation 
653 |a Tradeoffs 
653 |a Edge computing 
653 |a Cities 
653 |a Disaster management 
653 |a Damage detection 
653 |a Computer applications 
653 |a Modules 
653 |a Radar imaging 
653 |a Disasters 
653 |a Efficiency 
653 |a Datasets 
653 |a Semantics 
653 |a Remote sensing 
653 |a Synthetic aperture radar 
653 |a Neural networks 
653 |a Computational efficiency 
653 |a Computing costs 
653 |a Design 
653 |a Smart cities 
653 |a Architecture 
653 |a Latency 
653 |a Real time 
700 1 |a Zhao Boya  |u Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; zhaozhe22@mails.ucas.ac.cn (Z.Z.); zhaoby@aircas.ac.cn (B.Z.); durt@aircas.ac.cn (R.D.) 
700 1 |a Du Ruitong  |u Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; zhaozhe22@mails.ucas.ac.cn (Z.Z.); zhaoby@aircas.ac.cn (B.Z.); durt@aircas.ac.cn (R.D.) 
700 1 |a Wu Yuanfeng  |u Key Laboratory of Computational Optical Imaging Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; zhaozhe22@mails.ucas.ac.cn (Z.Z.); zhaoby@aircas.ac.cn (B.Z.); durt@aircas.ac.cn (R.D.) 
700 1 |a Chen Jiaen  |u College of Information Science and Technology, Shihezi University, Shihezi 832000, China; 20222108012@stu.shzu.edu.cn (J.C.); zhengyuchen@shzu.edu.cn (Y.Z.) 
700 1 |a Zheng Yuchen  |u College of Information Science and Technology, Shihezi University, Shihezi 832000, China; 20222108012@stu.shzu.edu.cn (J.C.); zhengyuchen@shzu.edu.cn (Y.Z.) 
773 0 |t Remote Sensing  |g vol. 17, no. 24 (2025), p. 3939-3964 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3286352446/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3286352446/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3286352446/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch