Probabilistic Scene Graph Generation and Its Applications

שמור ב:

מידע ביבליוגרפי
הוצא לאור ב:	ProQuest Dissertations and Theses (2025)
מחבר ראשי:	Biswas, Bashirul Azam
יצא לאור:	ProQuest Dissertations & Theses
נושאים:	Electrical engineering Mathematics Applied mathematics
גישה מקוונת:	Citation/Abstract Full Text - PDF
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC


LEADER	00000nab a2200000uu 4500
001	3216675606
003	UK-CbPIL
020			\|a 9798280713369
035			\|a 3216675606
045	2		\|b d20250101 \|b d20251231
084			\|a 66569 \|2 nlm
100	1		\|a Biswas, Bashirul Azam
245	1		\|a Probabilistic Scene Graph Generation and Its Applications
260			\|b ProQuest Dissertations & Theses \|c 2025
513			\|a Dissertation/Thesis
520	3		\|a Scene graphs encode relationships between image entities as triplets (subject-relationship-object), where nodes represent grounded entities and directed edges define relationships from the subject to the object. The Scene Graph Generation (SGG) task faces significant challenges, including difficulty detecting small or occluded entities and classifying entities and relationships due to imbalanced class distributions and ambiguous annotations. As a result, SGG models often suffer from low accuracy and a bias toward frequently occurring classes. Existing methods employ techniques such as re-weighting training samples or post-processing inference results to mitigate the bias. However, these approaches often compromise overall accuracy, as they trade off general model performance for a more balanced class distribution. In this thesis, we leverage prior knowledge of scene graph triplets to enhance accuracy and mitigate bias in trained SGG models in a principled manner. We propose a Bayesian Network (BN) to capture the stable within-triplet prior and a Conditional Random Field (CRF) to model the between-triplet prior of scene graph triplets. BN inference, when applied to uncertain evidence from a biased SGG model, improves the overall accuracy as well as mitigates bias. The CRF further refines predictions by integrating unary potentials derived from the BN posterior with pairwise potentials, representing the between-triplet prior learned from triplet co-occurrence statistics. Beyond improving performance in static scene graphs, we explore the challenge of integrating both static and temporal potentials in Dynamic Scene Graph (DSG) generation. Existing methods implicitly assume that all relationships in DSG are purely temporal, neglecting their static components. To address this, we propose a Transformer-based CRF model that effectively captures both static and long-term temporal potentials, demonstrating its superiority over traditional Transformer-based approaches. Finally, we showcase the effectiveness of scene graphs as a bridge for Visual Question Answering (VQA). Prior works on SG-based VQA assume that every question can be answered solely from the perfect scene graph, leading to poor performance on questions unrelated to the scene graph. To overcome this limitation, we introduce an uncertainty-guided approach that combines predictions from two Bayesian ensembles: one for image-based VQA and another for SG-based VQA, ensuring more robust and accurate question answering.
653			\|a Electrical engineering
653			\|a Mathematics
653			\|a Applied mathematics
773	0		\|t ProQuest Dissertations and Theses \|g (2025)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3216675606/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3216675606/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch