Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment
Guardado en:
| Publicado en: | arXiv.org (Aug 3, 2024), p. n/a |
|---|---|
| Autor principal: | |
| Otros Autores: | , |
| Publicado: |
Cornell University Library, arXiv.org
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full text outside of ProQuest |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3089694188 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 3089694188 | ||
| 045 | 0 | |b d20240803 | |
| 100 | 1 | |a Sahami, Sadid | |
| 245 | 1 | |a Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment | |
| 260 | |b Cornell University Library, arXiv.org |c Aug 3, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of \(N\) frames in a UGV as an \(M\)-hop path graph \(\mathcal{G}^o\) for \(M \ll N\), where the similarity between two frames within \(M\) time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" \(\mathcal{G}^o\) to a \(1\)-hop path graph \(\mathcal{G}\), specified by a generalized graph Laplacian matrix \(\mathcal{L}\), via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue \(\lambda_{\min}(\mathbf{B})\) of a coefficient matrix \(\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + \mu \mathcal{L}\), where \(\mathbf{h}\) is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound \(\lambda^-_{\min}(\mathbf{B})\) by choosing \(\mathbf{h}\) via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity. | |
| 653 | |a User generated content | ||
| 653 | |a Eigenvalues | ||
| 653 | |a Lower bounds | ||
| 653 | |a Similarity | ||
| 653 | |a Algorithms | ||
| 653 | |a Video data | ||
| 653 | |a Alignment | ||
| 653 | |a Matrices (mathematics) | ||
| 653 | |a Frames (data processing) | ||
| 653 | |a Signal reconstruction | ||
| 653 | |a Social networks | ||
| 653 | |a Sampling | ||
| 700 | 1 | |a Cheung, Gene | |
| 700 | 1 | |a Chia-Wen, Lin | |
| 773 | 0 | |t arXiv.org |g (Aug 3, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3089694188/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2408.01859 |