Disentangled Motion Modeling for Video Frame Interpolation

Guardat en:

Dades bibliogràfiques
Publicat a:	arXiv.org (Dec 19, 2024), p. n/a
Autor principal:	Lew, Jaihyun
Altres autors:	Choi, Jooyoung, Shin, Chaehun, Jung, Dahuin, Yoon, Sungroh
Publicat:	Cornell University Library, arXiv.org
Matèries:	Computing costs Pixels Smoothness Frames (data processing) Optical flow (image analysis) Interpolation Computational efficiency
Accés en línia:	Citation/Abstract Full text outside of ProQuest
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Descripció
Resum:	Video Frame Interpolation (VFI) aims to synthesize intermediate frames between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works have employed generative models for improved perceptual quality. However, they require complex training and large computational costs for pixel space modeling. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose a disentangled two-stage training process. In the initial stage, frame synthesis and flow models are trained to generate accurate frames and flows optimal for synthesis. In the subsequent stage, we introduce a motion diffusion model, which incorporates our novel U-Net architecture specifically designed for optical flow, to generate bi-directional flows between frames. By learning the simpler low-frequency representation of motions, MoMo achieves superior perceptual quality with reduced computational demands compared to the generative modeling methods on the pixel space. MoMo surpasses state-of-the-art methods in perceptual metrics across various benchmarks, demonstrating its efficacy and efficiency in VFI.
ISSN:	2331-8422
Font:	Engineering Database