Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Aug 3, 2024), p. n/a
Autor principal: Sahami, Sadid
Otros Autores: Cheung, Gene, Chia-Wen, Lin
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3089694188
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3089694188 
045 0 |b d20240803 
100 1 |a Sahami, Sadid 
245 1 |a Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment 
260 |b Cornell University Library, arXiv.org  |c Aug 3, 2024 
513 |a Working Paper 
520 3 |a User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of \(N\) frames in a UGV as an \(M\)-hop path graph \(\mathcal{G}^o\) for \(M \ll N\), where the similarity between two frames within \(M\) time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" \(\mathcal{G}^o\) to a \(1\)-hop path graph \(\mathcal{G}\), specified by a generalized graph Laplacian matrix \(\mathcal{L}\), via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue \(\lambda_{\min}(\mathbf{B})\) of a coefficient matrix \(\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + \mu \mathcal{L}\), where \(\mathbf{h}\) is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound \(\lambda^-_{\min}(\mathbf{B})\) by choosing \(\mathbf{h}\) via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity. 
653 |a User generated content 
653 |a Eigenvalues 
653 |a Lower bounds 
653 |a Similarity 
653 |a Algorithms 
653 |a Video data 
653 |a Alignment 
653 |a Matrices (mathematics) 
653 |a Frames (data processing) 
653 |a Signal reconstruction 
653 |a Social networks 
653 |a Sampling 
700 1 |a Cheung, Gene 
700 1 |a Chia-Wen, Lin 
773 0 |t arXiv.org  |g (Aug 3, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3089694188/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2408.01859