Self-Similarity in Deep Neural Network Modules for Images and Videos

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Gauen, Kent
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Self-similarity within and across video frames is used in today’s best methods for image/video restoration and generation in the form of attention and warping modules within deep neural networks (DNNs). Self-similarity is a microcosm of the greater mission to understand images by first understanding components within the image. While this design philosophy mirrors how generic DNNs build features, DNNs are at the mercy of many practical limitations such as available training data, specified loss function, network size, and training time. Even when properly scaled up, DNNs are still limited by observed correlations. Robust generalization necessitates that DNNs learn underlying scientific principles, but today’s DNNs have not demonstrated this understanding. In short, DNNs should be learning scientific relationships, but they do not because science is not in the data. This limitation is the inspiration for a larger research goal to shift DNNs from data-only learning to data-driven learning by designing architectures that explicitly incorporate assumptions about our data via inductive bias. This thesis contributes new ideas to principally improve DNNs for images and videos by incorporating our knowledge of self-similarity into modules.In Chapter 2, we hypothesize the attention operator within DNNs for video denoising acts as an optimal denoiser rather than a complicated or abstract transformation of features. Using this connection, we design a new search module, the shifted neighborhood search, to improve the space-time attention module. This search step is a method to identify selfsimilar regions. We show this simple grid search is of higher quality than existing DNN alternatives. Our particular implementation of the grid search is computationally efficient, and it is accompanied by a user-friendly Python+Pytorch package. Our findings suggest a perhaps obvious greater lesson:explicitly computing a desired quantity is better than learning it from data. The success of attention modules suggest sparse, data-dependent memory access is important, but rather than learn how to run this search we can use simple assumptions about images (self-similarity) to improve it.As Chapter 2 demonstrates the importance of selecting the best neighbors, Chapter 3 presents a new way to use these selected neighbors. Ordinarily, an attention module reweights each point according to the similarity between the query point and a grid of key points. However, in the presence of noise, this similarity is unreliable. The impact of noise can be mitigated by first clustering pixels into deformably-shaped, self-similar regions. This chapter proposes a re-weighting step to augment neighborhood attention using these clusters of pixels, called superpixels. By viewing superpixel similarities as part of the image formation process, we show this re-weighted attention operator corresponds to an optimal denoiser that is a re-weighted variation of the naive one.With the hope of extending our single-image superpixel method to space-time, we noticed the fastest space-time superpixels methods execute at only about two frames per second. So in Chapter 4, we present a new method for space-time superpixels which runs at nearly 60 frames per second. We estimate space-time superpixels by hill-climbing to a local mode of a Dirichlet-Process Gaussian Mixture Model (DP-GMM) conditioned on the previous frame’s superpixel information. The DP-GMM model allows for principled splitting and merging of superpixels, which can explain disocclusion due to motion and allows the number of superpixels to adapt to the image’s content. While alternative methods confine each superpixel to a particular square grid within the image, our space-time superpixels are not restricted in this way. Hence our space-time superpixels are “off the grid.” 
ISBN:9798290635477
Fuente:ProQuest Dissertations & Theses Global