ELFT: Efficient local-global fusion transformer for small object detection

Uloženo v:
Podrobná bibliografie
Vydáno v:PLoS One vol. 20, no. 9 (Sep 2025), p. e0332714
Hlavní autor: Hua, Guoguang
Další autoři: Wu, Fangfang, Hao, Guangzhao, Xia, Chenbo, Li, Li
Vydáno:
Public Library of Science
Témata:
On-line přístup:Citation/Abstract
Full Text
Full Text - PDF
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3254055215
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0332714  |2 doi 
035 |a 3254055215 
045 2 |b d20250901  |b d20250930 
084 |a 174835  |2 nlm 
100 1 |a Hua, Guoguang 
245 1 |a ELFT: Efficient local-global fusion transformer for small object detection 
260 |b Public Library of Science  |c Sep 2025 
513 |a Journal Article 
520 3 |a Small object detection is an essential but challenging task in computer vision. Transformer-based algorithms have demonstrated remarkable performance in the domain of computer vision tasks. Nevertheless, they suffer from inadequate feature extraction for small objects. Additionally, they face difficulties in deployment on resource-constrained platforms due to their heavy computational burden. To tackle these problems, an efficient local-global fusion Transformer (ELFT) is proposed for small object detection, which is based on attention and grouping strategy. Specifically, we first design an efficient local-global fusion attention (ELGFA) mechanism to extract sufficient location features and integrate detailed information from feature maps, thereby promoting the accuracy. Besides, we present a grouped feature update module (GFUM) to reduce computational complexity by alternately updating high-level and low-level features within each group. Furthermore, the broadcast context module (CB) is introduced to obtain richer context information. It further enhances the ability to detect small objects. Extensive experiments conducted on three benchmarks, i.e. Remote Sensing Object Detection (RSOD), NWPU VHR-10 and PASCAL VOC2007, achieving 95.8%, 94.3% and 85.2% in mean average precision (mAP), respectively. Compared to DINO, the number of parameters is reduced by 10.4%, and the floating point operations (FLOPs) are reduced by 22.7%. The experimental results demonstrate the efficacy of ELFT in small object detection tasks, while maintaining an attractive level of computational complexity. 
653 |a Feature extraction 
653 |a Remote sensing 
653 |a Benchmarks 
653 |a Accuracy 
653 |a Computer vision 
653 |a Neural networks 
653 |a Task complexity 
653 |a Feature maps 
653 |a Attention 
653 |a Design 
653 |a Integrated approach 
653 |a Computer applications 
653 |a Information processing 
653 |a Modules 
653 |a Algorithms 
653 |a Telematics 
653 |a Object recognition 
653 |a Localization 
653 |a Floating point arithmetic 
653 |a Semantics 
653 |a Economic 
700 1 |a Wu, Fangfang 
700 1 |a Hao, Guangzhao 
700 1 |a Xia, Chenbo 
700 1 |a Li, Li 
773 0 |t PLoS One  |g vol. 20, no. 9 (Sep 2025), p. e0332714 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3254055215/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3254055215/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3254055215/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch