SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

Guardado en:

Detalles Bibliográficos
Publicado en:	arXiv.org (Dec 6, 2024), p. n/a
Autor principal:	Li, Suyi
Otros Autores:	Yang, Lingyun, Jiang, Xiaoxiao, Lu, Hanfeng, An, Dakai, Zhipeng Di, Lu, Weiyi, Chen, Jiawei, Liu, Kan, Yu, Yinghao, Tao, Lan, Yang, Guodong, Qu, Lin, Zhang, Liping, Wang, Wei
Publicado:	Cornell University Library, arXiv.org
Materias:	Parallel processing Modules Image quality Image processing Workflow Effectiveness
Acceso en línea:	Citation/Abstract Full text outside of ProQuest
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with various "add-on" modules, notably ControlNet and LoRA, to enhance image generation control. Compared to serving the base model alone, these add-on modules introduce significant loading and computational overhead, resulting in increased latency. In this paper, we present SwiftDiffusion, a system that efficiently serves a T2I workflow through a holistic approach. SwiftDiffusion decouples ControNet from the base model and deploys it as a separate, independently scaled service on dedicated GPUs, enabling ControlNet caching, parallelization, and sharing. To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL) technique, allowing LoRA loading to overlap with the initial base model execution by up to k steps without compromising image quality. Furthermore, SwiftDiffusion optimizes base model execution with a novel latent parallelism technique. Collectively, these designs enable SwiftDiffusion to outperform the state-of-the-art T2I serving systems, achieving up to 7.8x latency reduction and 1.6x throughput improvement in serving SDXL models on H800 GPUs, without sacrificing image quality.
ISSN:	2331-8422
Fuente:	Engineering Database