Study on an Improved YOLOv7-Based Algorithm for Human Head Detection

Guardado en:
Detalles Bibliográficos
Publicado en:Electronics vol. 14, no. 9 (2025), p. 1889
Autor principal: Wu, Dong
Otros Autores: Yan, Weidong, Wang, Jingli
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3203195677
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14091889  |2 doi 
035 |a 3203195677 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Wu, Dong 
245 1 |a Study on an Improved YOLOv7-Based Algorithm for Human Head Detection 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a In response to the decreased accuracy in person detection caused by densely populated areas and mutual occlusions in public spaces, a human head-detection approach is employed to assist in detecting individuals. To address key issues in dense scenes—such as poor feature extraction, rough label assignment, and inefficient pooling—we improved the YOLOv7 network in three aspects: adding attention mechanisms, enhancing the receptive field, and applying multi-scale feature fusion. First, a large amount of surveillance video data from crowded public spaces was collected to compile a head-detection dataset. Then, based on YOLOv7, the network was optimized as follows: (1) a CBAM attention module was added to the neck section; (2) a Gaussian receptive field-based label-assignment strategy was implemented at the junction between the original feature-fusion module and the detection head; (3) the SPPFCSPC module was used to replace the multi-space pyramid pooling. By seamlessly uniting CBAM, RFLAGauss, and SPPFCSPC, we establish a novel collaborative optimization framework. Finally, experimental comparisons revealed that the improved model’s accuracy increased from 92.4% to 94.4%; recall improved from 90.5% to 93.9%; and inference speed increased from 87.2 frames per second to 94.2 frames per second. Compared with single-stage object-detection models such as YOLOv7 and YOLOv8, the model demonstrated superior accuracy and inference speed. Its inference speed also significantly outperforms that of Faster R-CNN, Mask R-CNN, DINOv2, and RT-DETRv2, markedly enhancing both small-object (head) detection performance and efficiency. 
653 |a Feature extraction 
653 |a Accuracy 
653 |a Public spaces 
653 |a Cameras 
653 |a Labels 
653 |a Collaboration 
653 |a Deep learning 
653 |a Frames (data processing) 
653 |a Computer vision 
653 |a Pedestrians 
653 |a Artificial neural networks 
653 |a Frames per second 
653 |a Shopping centers 
653 |a Inference 
653 |a Algorithms 
653 |a Video data 
653 |a Modules 
653 |a Surveillance 
653 |a Object recognition 
653 |a Localization 
653 |a Crowds 
653 |a Efficiency 
700 1 |a Yan, Weidong 
700 1 |a Wang, Jingli 
773 0 |t Electronics  |g vol. 14, no. 9 (2025), p. 1889 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3203195677/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3203195677/fulltextwithgraphics/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3203195677/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch