Probabilistic Scene Graph Generation and Its Applications
Uloženo v:
| Vydáno v: | ProQuest Dissertations and Theses (2025) |
|---|---|
| Hlavní autor: | |
| Vydáno: |
ProQuest Dissertations & Theses
|
| Témata: | |
| On-line přístup: | Citation/Abstract Full Text - PDF |
| Tagy: |
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstrakt: | Scene graphs encode relationships between image entities as triplets (subject-relationship-object), where nodes represent grounded entities and directed edges define relationships from the subject to the object. The Scene Graph Generation (SGG) task faces significant challenges, including difficulty detecting small or occluded entities and classifying entities and relationships due to imbalanced class distributions and ambiguous annotations. As a result, SGG models often suffer from low accuracy and a bias toward frequently occurring classes. Existing methods employ techniques such as re-weighting training samples or post-processing inference results to mitigate the bias. However, these approaches often compromise overall accuracy, as they trade off general model performance for a more balanced class distribution. In this thesis, we leverage prior knowledge of scene graph triplets to enhance accuracy and mitigate bias in trained SGG models in a principled manner. We propose a Bayesian Network (BN) to capture the stable within-triplet prior and a Conditional Random Field (CRF) to model the between-triplet prior of scene graph triplets. BN inference, when applied to uncertain evidence from a biased SGG model, improves the overall accuracy as well as mitigates bias. The CRF further refines predictions by integrating unary potentials derived from the BN posterior with pairwise potentials, representing the between-triplet prior learned from triplet co-occurrence statistics. Beyond improving performance in static scene graphs, we explore the challenge of integrating both static and temporal potentials in Dynamic Scene Graph (DSG) generation. Existing methods implicitly assume that all relationships in DSG are purely temporal, neglecting their static components. To address this, we propose a Transformer-based CRF model that effectively captures both static and long-term temporal potentials, demonstrating its superiority over traditional Transformer-based approaches. Finally, we showcase the effectiveness of scene graphs as a bridge for Visual Question Answering (VQA). Prior works on SG-based VQA assume that every question can be answered solely from the perfect scene graph, leading to poor performance on questions unrelated to the scene graph. To overcome this limitation, we introduce an uncertainty-guided approach that combines predictions from two Bayesian ensembles: one for image-based VQA and another for SG-based VQA, ensuring more robust and accurate question answering. |
|---|---|
| ISBN: | 9798280713369 |
| Zdroj: | ProQuest Dissertations & Theses Global |