Selective Attention and Refinement for Efficient Small Object Detection
Guardado en:
| Publicado en: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Autor principal: | |
| Publicado: |
ProQuest Dissertations & Theses
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3225325995 | ||
| 003 | UK-CbPIL | ||
| 020 | |a 9798286445868 | ||
| 035 | |a 3225325995 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 66569 |2 nlm | ||
| 100 | 1 | |a Zhang, Tianyi | |
| 245 | 1 | |a Selective Attention and Refinement for Efficient Small Object Detection | |
| 260 | |b ProQuest Dissertations & Theses |c 2025 | ||
| 513 | |a Dissertation/Thesis | ||
| 520 | 3 | |a Small object detection remains a critical yet challenging task in computer vision, with applications ranging from autonomous driving to surveillance and remote sensing. Small object detection presents a significant challenge in computer vision due to limited visual cues, reduced spatial resolution, and the dominance of background regions in typical image frames. To address this, we propose a biologically inspired and progressively enhanced framework that improves both the efficiency and effectiveness of small object detection systems. Motivated by the human visual system’s selective attention capabilities, this thesis introduces a four-stage pipeline that progressively refines attention, resolution, and reconstruction quality for better detection performance.First, inspired by the selective attention mechanism in human vision, we propose to introduce a saliency-based processing step in the CMOS image sensor, to continuously select pixels corresponding to salient objects and feedback such information to the sensor, instead of blindly passing all pixels to the sensor output. To minimize the overhead of saliency detection in this feedback loop, we down-sampled the input data and optimized the model structure, while maintaining the performance of object detection (OD) with selected regions. Specifically, our method achieves 70.5% reduction in the volume of output pixels on BDD100K, which translates to 4.3× and 3.4× reduction in power consumption and latency, respectively.Second, we simplify the overly fine-grained pixel-level approach by adopting a patch-level selection strategy and propose Patch-SA, a patch-wise selection transformer designed for better accuracy and efficiency. Patch-SA constructs a feature pyramid to capture object information across various scales and employs an expansive reconstruction pathway to progressively recover high-resolution features, resulting in richer representations. Compared to previous pixel-based approaches, experimental evaluations on three widely used datasets demonstrate that Patch-SA not only achieves higher selection accuracy with reduced information loss but also transmits much fewer pixels, thereby enhancing overall efficiency.Third, we couple the selection network with a transformer-based super-resolution (SR) module, forming an end-to-end architecture, Selective Super-Resolution (SSR), that refines spatial details of selected patches while suppressing reconstruction of unimportant background regions, which may introduce extraneous or potentially detrimental information for downstream computer vision tasks. By deeply reconstructing only the object-containing patches, SSR not only improves image quality with lower FID, but also reduces the computational cost for SR.In the final stage, we address the limitations of the transformer-based super-resolution (SR) module, which, due to its constrained upscaling capacity, fails to substantially enhance small object detection performance. To overcome this, we replace the transformer SR module with a diffusion-based generative model capable of high-fidelity reconstruction at larger scaling factors (8×). The strong generative capacity of diffusion models enables superior preservation of fine-grained object details that are essential for detecting small objects. Specifically, our approach improves the mean Average Precision (mAP) for small objects from 1.03 to 8.93 on the BDD100K dataset and reduces the computational data volume by over 77%, demonstrating both its effectiveness and efficiency.In summary, this dissertation presents a novel framework that combines biologically inspired selection, efficient patch-based attention, and advanced generative reconstruction to achieve state-of-the-art performance in small object detection. Extensive experiments across multiple datasets demonstrate the robustness and practicality of the proposed approach, paving the way for its deployment in real-world applications. | |
| 653 | |a Electrical engineering | ||
| 653 | |a Computer engineering | ||
| 653 | |a Computer science | ||
| 773 | 0 | |t ProQuest Dissertations and Theses |g (2025) | |
| 786 | 0 | |d ProQuest |t ProQuest Dissertations & Theses Global | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3225325995/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3225325995/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |