Probabilistic Scene Graph Generation and Its Applications

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:ProQuest Dissertations and Theses (2025)
מחבר ראשי: Biswas, Bashirul Azam
יצא לאור:
ProQuest Dissertations & Theses
נושאים:
גישה מקוונת:Citation/Abstract
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 3216675606
003 UK-CbPIL
020 |a 9798280713369 
035 |a 3216675606 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Biswas, Bashirul Azam 
245 1 |a Probabilistic Scene Graph Generation and Its Applications 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Scene graphs encode relationships between image entities as triplets (subject-relationship-object), where nodes represent grounded entities and directed edges define relationships from the subject to the object. The Scene Graph Generation (SGG) task faces significant challenges, including difficulty detecting small or occluded entities and classifying entities and relationships due to imbalanced class distributions and ambiguous annotations. As a result, SGG models often suffer from low accuracy and a bias toward frequently occurring classes. Existing methods employ techniques such as re-weighting training samples or post-processing inference results to mitigate the bias. However, these approaches often compromise overall accuracy, as they trade off general model performance for a more balanced class distribution. In this thesis, we leverage prior knowledge of scene graph triplets to enhance accuracy and mitigate bias in trained SGG models in a principled manner. We propose a Bayesian Network (BN) to capture the stable within-triplet prior and a Conditional Random Field (CRF) to model the between-triplet prior of scene graph triplets. BN inference, when applied to uncertain evidence from a biased SGG model, improves the overall accuracy as well as mitigates bias. The CRF further refines predictions by integrating unary potentials derived from the BN posterior with pairwise potentials, representing the between-triplet prior learned from triplet co-occurrence statistics. Beyond improving performance in static scene graphs, we explore the challenge of integrating both static and temporal potentials in Dynamic Scene Graph (DSG) generation. Existing methods implicitly assume that all relationships in DSG are purely temporal, neglecting their static components. To address this, we propose a Transformer-based CRF model that effectively captures both static and long-term temporal potentials, demonstrating its superiority over traditional Transformer-based approaches. Finally, we showcase the effectiveness of scene graphs as a bridge for Visual Question Answering (VQA). Prior works on SG-based VQA assume that every question can be answered solely from the perfect scene graph, leading to poor performance on questions unrelated to the scene graph. To overcome this limitation, we introduce an uncertainty-guided approach that combines predictions from two Bayesian ensembles: one for image-based VQA and another for SG-based VQA, ensuring more robust and accurate question answering. 
653 |a Electrical engineering 
653 |a Mathematics 
653 |a Applied mathematics 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3216675606/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3216675606/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch