DLiteNet: A Dual-Branch Lightweight Framework for Efficient and Precise Building Extraction from Visible and SAR Imagery
Guardat en:
| Publicat a: | Remote Sensing vol. 17, no. 24 (2025), p. 3939-3964 |
|---|---|
| Autor principal: | |
| Altres autors: | , , , , |
| Publicat: |
MDPI AG
|
| Matèries: | |
| Accés en línia: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetes: |
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
| Resum: | <sec sec-type="highlights"> What are the main findings? <list list-type="bullet"> <list-item> </list-item>A dual-branch lightweight multimodal framework (DLiteNet) is proposed. It decouples building extraction into a context branch for global semantics (via STDAC) and a CDAM-guided spatial branch for edges and details, with MCAM adaptively fusing visible–SAR features. Removing the complex decoding stage enables efficient segmentation. <list-item> DLiteNet consistently outperforms state-of-the-art multimodal building-extraction methods on the DFC23 Track2 and MSAW datasets, achieving a strong efficiency–precision trade-off and demonstrating strong potential for real-time on-board deployment. </list-item> What is the implication of the main finding? <list list-type="bullet"> <list-item> </list-item>By removing complex decoding and adopting a dual-branch, task-decoupled design, DLiteNet shows that accurate visible–SAR building extraction is achievable under tight compute/memory budgets, enabling large-area, high-frequency mapping and providing a reusable blueprint for other multimodal segmentation tasks (e.g., roads, damage, change detection). <list-item> Its lightweight yet precise architecture makes real-time on-board deployment on UAVs and other edge platforms practical for city monitoring and rapid disaster response. </list-item> High-precision and efficient building extraction by fusing visible and synthetic aperture radar (SAR) imagery is critical for applications such as smart cities, disaster response, and UAV navigation. However, existing approaches often rely on complex multimodal feature extraction and deep fusion mechanisms, resulting in over-parameterized models and excessive computation, which makes it challenging to balance accuracy and efficiency. To address this issue, we propose a dual-branch lightweight architecture, DLiteNet, which functionally decouples the multimodal building extraction task into two sub-tasks: global context modeling and spatial detail capturing. Accordingly, we design a lightweight context branch and spatial branch to achieve an optimal trade-off between semantic accuracy and computational efficiency. The context branch jointly processes visible and SAR images, leveraging our proposed Multi-scale Context Attention Module (MCAM) to adaptively fuse multimodal contextual information, followed by a lightweight Short-Term Dense Atrous Concatenate (STDAC) module for extracting high-level semantics. The spatial branch focuses on capturing textures and edge structures from visible imagery and employs a Context-Detail Aggregation Module (CDAM) to fuse contextual priors and refine building contours. Experiments on the MSAW and DFC23 Track2 datasets demonstrate that DLiteNet achieves strong performance with only 5.6 M parameters and extremely low computational costs (51.7/5.8 GFLOPs), significantly outperforming state-of-the-art models such as CMGFNet (85.2 M, 490.9/150.3 GFLOPs) and MCANet (71.2 M, 874.5/375.9 GFLOPs). On the MSAW dataset, DLiteNet achieves the highest accuracy (83.6% IoU, 91.1% F1-score), exceeding the best MCANet baseline by 1.0% IoU and 0.6% F1-score. Furthermore, deployment tests on the Jetson Orin NX edge device show that DLiteNet achieves a low inference latency of 14.97 ms per frame under FP32 precision, highlighting its real-time capability and deployment potential in edge computing scenarios. |
|---|---|
| ISSN: | 2072-4292 |
| DOI: | 10.3390/rs17243939 |
| Font: | Advanced Technologies & Aerospace Database |