Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

محفوظ في:
التفاصيل البيبلوغرافية
الحاوية / القاعدة:arXiv.org (Aug 3, 2024), p. n/a
المؤلف الرئيسي: Sahami, Sadid
مؤلفون آخرون: Cheung, Gene, Chia-Wen, Lin
منشور في:
Cornell University Library, arXiv.org
الموضوعات:
الوصول للمادة أونلاين:Citation/Abstract
Full text outside of ProQuest
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
مستخلص:User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of \(N\) frames in a UGV as an \(M\)-hop path graph \(\mathcal{G}^o\) for \(M \ll N\), where the similarity between two frames within \(M\) time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" \(\mathcal{G}^o\) to a \(1\)-hop path graph \(\mathcal{G}\), specified by a generalized graph Laplacian matrix \(\mathcal{L}\), via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue \(\lambda_{\min}(\mathbf{B})\) of a coefficient matrix \(\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + \mu \mathcal{L}\), where \(\mathbf{h}\) is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound \(\lambda^-_{\min}(\mathbf{B})\) by choosing \(\mathbf{h}\) via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.
تدمد:2331-8422
المصدر:Engineering Database