SMM-POD: Panoramic 3D Object Detection via Spherical Multi-Stage Multi-Modal Fusion

Guardado en:
Detalles Bibliográficos
Publicado en:Remote Sensing vol. 17, no. 12 (2025), p. 2089-2111
Autor principal: Zhang Jinghan
Otros Autores: Yang, Yusheng, Gao Zhiyuan, Shi, Hang, Xie Yangmin
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Panoramic 3D object detection is a challenging task due to image distortion, sensor heterogeneity, and the difficulty of combining information from multiple modalities over a wide field-of-view (FoV). To address these issues, we propose SMM-POD, a novel framework that introduces a spherical multi-stage fusion strategy for panoramic 3D detection. Our approach creates a five-channel spherical image aligned with LiDAR data and uses a quasi-uniform Voronoi sphere (UVS) model to reduce projection distortion. A cross-attention-based feature extraction module and a transformer encoder–decoder with spherical positional encoding enable the accurate and efficient fusion of image and point cloud features. For precise 3D localization, we adopt a Frustum PointNet module. Experiments on the DAIR-V2X-I benchmark and our self-collected SHU-3DPOD dataset show that SMM-POD achieves a state-of-the-art performance across all object categories. It significantly improves the detection of small objects like cyclists and pedestrians and maintains stable results under various environmental conditions. These results demonstrate the effectiveness of SMM-POD in panoramic multi-modal 3D perception and establish it as a strong baseline for wide FoV object detection.
ISSN:2072-4292
DOI:10.3390/rs17122089
Fuente:Advanced Technologies & Aerospace Database