ImVoxelGNet: Image to voxels geometry-aware projection for multi-view RGB-based 3D object detection

Gorde:
Xehetasun bibliografikoak
Argitaratua izan da:PLoS One vol. 20, no. 5 (May 2025), p. e0320589
Egile nagusia: Xu, Gang
Beste egile batzuk: Leng, Biao, Zhang, Xiong
Argitaratua:
Public Library of Science
Gaiak:
Sarrera elektronikoa:Citation/Abstract
Full Text
Full Text - PDF
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!

MARC

LEADER 00000nab a2200000uu 4500
001 3205743834
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0320589  |2 doi 
035 |a 3205743834 
045 2 |b d20250501  |b d20250531 
084 |a 174835  |2 nlm 
100 1 |a Xu, Gang 
245 1 |a ImVoxelGNet: Image to voxels geometry-aware projection for multi-view RGB-based 3D object detection 
260 |b Public Library of Science  |c May 2025 
513 |a Journal Article 
520 3 |a 3D object detection based solely on image data presents a significant challenge in computer vision, primarily due to the need to integrate geometric perception processes derived from visual inputs. The key to overcoming this challenge lies in effectively capturing the geometric relationships across multiple viewpoints, thereby establishing strong geometric priors. Current methods commonly back-project voxels onto images to align voxel-pixel features, yet during this process, pixel features are insufficiently involved in learning, leading to a decrease in geometric perception accuracy and, consequently, impacting detection performance. To address this limitation, we propose a novel network framework called ImVoxelGNet. This framework first integrates features projected onto pixels via a expansion operation, compensating for the pixel information inadequately utilized in traditional back-projection methods, thus enabling more precise learning of spatial geometric features. Additionally, we design an implicit geometric perception structure that further refines the spatial geometric features obtained after integrating image features, learning the occupancy relationships in spatial voxels and updating them within the spatial features. Finally, we generate the final prediction results by combining a detection head with 3D convolutions. Evaluation on the ScanNetV2 multi-view 3D object detection dataset demonstrates that ImVoxelGNet achieves a performance improvement of up to 2.2% in mean average precision (mAP). This improvement effectively demonstrates the efficacy of our method in significantly enhancing 3D object detection performance through improved geometric perception and comprehensive scene understanding. Codes and data are released in https://github.com/xug-coder/ImVoxelGNet. 
653 |a Visual perception 
653 |a Pixels 
653 |a Scene analysis 
653 |a Perception 
653 |a Images 
653 |a Computer vision 
653 |a Methods 
653 |a Information processing 
653 |a Object recognition 
653 |a Research & development--R&D 
653 |a Performance evaluation 
653 |a Neighborhoods 
653 |a Geometry 
653 |a Spatial discrimination learning 
653 |a Robotics 
653 |a Environmental 
700 1 |a Leng, Biao 
700 1 |a Zhang, Xiong 
773 0 |t PLoS One  |g vol. 20, no. 5 (May 2025), p. e0320589 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3205743834/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3205743834/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3205743834/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch