ELFT: Efficient local-global fusion transformer for small object detection
Uloženo v:
| Vydáno v: | PLoS One vol. 20, no. 9 (Sep 2025), p. e0332714 |
|---|---|
| Hlavní autor: | |
| Další autoři: | , , , |
| Vydáno: |
Public Library of Science
|
| Témata: | |
| On-line přístup: | Citation/Abstract Full Text Full Text - PDF |
| Tagy: |
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3254055215 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 1932-6203 | ||
| 024 | 7 | |a 10.1371/journal.pone.0332714 |2 doi | |
| 035 | |a 3254055215 | ||
| 045 | 2 | |b d20250901 |b d20250930 | |
| 084 | |a 174835 |2 nlm | ||
| 100 | 1 | |a Hua, Guoguang | |
| 245 | 1 | |a ELFT: Efficient local-global fusion transformer for small object detection | |
| 260 | |b Public Library of Science |c Sep 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a Small object detection is an essential but challenging task in computer vision. Transformer-based algorithms have demonstrated remarkable performance in the domain of computer vision tasks. Nevertheless, they suffer from inadequate feature extraction for small objects. Additionally, they face difficulties in deployment on resource-constrained platforms due to their heavy computational burden. To tackle these problems, an efficient local-global fusion Transformer (ELFT) is proposed for small object detection, which is based on attention and grouping strategy. Specifically, we first design an efficient local-global fusion attention (ELGFA) mechanism to extract sufficient location features and integrate detailed information from feature maps, thereby promoting the accuracy. Besides, we present a grouped feature update module (GFUM) to reduce computational complexity by alternately updating high-level and low-level features within each group. Furthermore, the broadcast context module (CB) is introduced to obtain richer context information. It further enhances the ability to detect small objects. Extensive experiments conducted on three benchmarks, i.e. Remote Sensing Object Detection (RSOD), NWPU VHR-10 and PASCAL VOC2007, achieving 95.8%, 94.3% and 85.2% in mean average precision (mAP), respectively. Compared to DINO, the number of parameters is reduced by 10.4%, and the floating point operations (FLOPs) are reduced by 22.7%. The experimental results demonstrate the efficacy of ELFT in small object detection tasks, while maintaining an attractive level of computational complexity. | |
| 653 | |a Feature extraction | ||
| 653 | |a Remote sensing | ||
| 653 | |a Benchmarks | ||
| 653 | |a Accuracy | ||
| 653 | |a Computer vision | ||
| 653 | |a Neural networks | ||
| 653 | |a Task complexity | ||
| 653 | |a Feature maps | ||
| 653 | |a Attention | ||
| 653 | |a Design | ||
| 653 | |a Integrated approach | ||
| 653 | |a Computer applications | ||
| 653 | |a Information processing | ||
| 653 | |a Modules | ||
| 653 | |a Algorithms | ||
| 653 | |a Telematics | ||
| 653 | |a Object recognition | ||
| 653 | |a Localization | ||
| 653 | |a Floating point arithmetic | ||
| 653 | |a Semantics | ||
| 653 | |a Economic | ||
| 700 | 1 | |a Wu, Fangfang | |
| 700 | 1 | |a Hao, Guangzhao | |
| 700 | 1 | |a Xia, Chenbo | |
| 700 | 1 | |a Li, Li | |
| 773 | 0 | |t PLoS One |g vol. 20, no. 9 (Sep 2025), p. e0332714 | |
| 786 | 0 | |d ProQuest |t Health & Medical Collection | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3254055215/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text |u https://www.proquest.com/docview/3254055215/fulltext/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3254055215/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |