Probabilistic Scene Graph Generation and Its Applications

Gorde:
Xehetasun bibliografikoak
Argitaratua izan da:ProQuest Dissertations and Theses (2025)
Egile nagusia: Biswas, Bashirul Azam
Argitaratua:
ProQuest Dissertations & Theses
Gaiak:
Sarrera elektronikoa:Citation/Abstract
Full Text - PDF
Etiketak: Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!
Deskribapena
Laburpena:Scene graphs encode relationships between image entities as triplets (subject-relationship-object), where nodes represent grounded entities and directed edges define relationships from the subject to the object. The Scene Graph Generation (SGG) task faces significant challenges, including difficulty detecting small or occluded entities and classifying entities and relationships due to imbalanced class distributions and ambiguous annotations. As a result, SGG models often suffer from low accuracy and a bias toward frequently occurring classes. Existing methods employ techniques such as re-weighting training samples or post-processing inference results to mitigate the bias. However, these approaches often compromise overall accuracy, as they trade off general model performance for a more balanced class distribution. In this thesis, we leverage prior knowledge of scene graph triplets to enhance accuracy and mitigate bias in trained SGG models in a principled manner. We propose a Bayesian Network (BN) to capture the stable within-triplet prior and a Conditional Random Field (CRF) to model the between-triplet prior of scene graph triplets. BN inference, when applied to uncertain evidence from a biased SGG model, improves the overall accuracy as well as mitigates bias. The CRF further refines predictions by integrating unary potentials derived from the BN posterior with pairwise potentials, representing the between-triplet prior learned from triplet co-occurrence statistics. Beyond improving performance in static scene graphs, we explore the challenge of integrating both static and temporal potentials in Dynamic Scene Graph (DSG) generation. Existing methods implicitly assume that all relationships in DSG are purely temporal, neglecting their static components. To address this, we propose a Transformer-based CRF model that effectively captures both static and long-term temporal potentials, demonstrating its superiority over traditional Transformer-based approaches. Finally, we showcase the effectiveness of scene graphs as a bridge for Visual Question Answering (VQA). Prior works on SG-based VQA assume that every question can be answered solely from the perfect scene graph, leading to poor performance on questions unrelated to the scene graph. To overcome this limitation, we introduce an uncertainty-guided approach that combines predictions from two Bayesian ensembles: one for image-based VQA and another for SG-based VQA, ensuring more robust and accurate question answering.
ISBN:9798280713369
Baliabidea:ProQuest Dissertations & Theses Global