Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation

Salvato in:

Dettagli Bibliografici
Pubblicato in:	Remote Sensing vol. 17, no. 16 (2025), p. 2805-2830
Autore principale:	Zhou Fuyang
Altri autori:	He, Haiqing, Chen, Ting, Zhang, Tao, Yang Minglu, Ye, Yuan, Liu Jiahao
Pubblicazione:	MDPI AG
Soggetti:	Labels Datasets Deep learning Forestry Lidar Artificial neural networks Image annotation Mapping Data processing Unmanned aerial vehicles Semantic segmentation Trees Performance evaluation Statistical analysis Forests Image segmentation Carbon sequestration Three dimensional models Instance segmentation Image acquisition Methods Algorithms Semantics
Accesso online:	Citation/Abstract Full Text + Graphics Full Text - PDF
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Descrizione
Abstract:	Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios.
ISSN:	2072-4292
DOI:	10.3390/rs17162805
Fonte:	Advanced Technologies & Aerospace Database