MV-Adapter: Multi-view Consistent Image Generation Made Easy

Guardado en:
Bibliografiske detaljer
Udgivet i:arXiv.org (Dec 4, 2024), p. n/a
Hovedforfatter: Huang, Zehuan
Andre forfattere: Yuan-Chen, Guo, Wang, Haoran, Yi, Ran, Ma, Lizhuang, Yan-Pei, Cao, Lu, Sheng
Udgivet:
Cornell University Library, arXiv.org
Fag:
Online adgang:Citation/Abstract
Full text outside of ProQuest
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 3141682678
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3141682678 
045 0 |b d20241204 
100 1 |a Huang, Zehuan 
245 1 |a MV-Adapter: Multi-view Consistent Image Generation Made Easy 
260 |b Cornell University Library, arXiv.org  |c Dec 4, 2024 
513 |a Working Paper 
520 3 |a Existing multi-view image generation methods often make invasive modifications to pre-trained text-to-image (T2I) models and require full fine-tuning, leading to (1) high computational costs, especially with large base models and high-resolution images, and (2) degradation in image quality due to optimization difficulties and scarce high-quality 3D data. In this paper, we propose the first adapter-based solution for multi-view image generation, and introduce MV-Adapter, a versatile plug-and-play adapter that enhances T2I models and their derivatives without altering the original network structure or feature space. By updating fewer parameters, MV-Adapter enables efficient training and preserves the prior knowledge embedded in pre-trained models, mitigating overfitting risks. To efficiently model the 3D geometric knowledge within the adapter, we introduce innovative designs that include duplicated self-attention layers and parallel attention architecture, enabling the adapter to inherit the powerful priors of the pre-trained models to model the novel 3D knowledge. Moreover, we present a unified condition encoder that seamlessly integrates camera parameters and geometric information, facilitating applications such as text- and image-based 3D generation and texturing. MV-Adapter achieves multi-view generation at 768 resolution on Stable Diffusion XL (SDXL), and demonstrates adaptability and versatility. It can also be extended to arbitrary view generation, enabling broader applications. We demonstrate that MV-Adapter sets a new quality standard for multi-view image generation, and opens up new possibilities due to its efficiency, adaptability and versatility. 
653 |a Image degradation 
653 |a Image resolution 
653 |a Image quality 
653 |a Versatility 
653 |a Image processing 
653 |a Parameters 
653 |a Quality standards 
653 |a Adapters 
653 |a Texturing 
700 1 |a Yuan-Chen, Guo 
700 1 |a Wang, Haoran 
700 1 |a Yi, Ran 
700 1 |a Ma, Lizhuang 
700 1 |a Yan-Pei, Cao 
700 1 |a Lu, Sheng 
773 0 |t arXiv.org  |g (Dec 4, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3141682678/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.03632