PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Kaydedildi:
Detaylı Bibliyografya
Yayımlandı:The Visual Computer vol. 41, no. 1 (Jan 2025), p. 303
Baskı/Yayın Bilgisi:
Springer Nature B.V.
Konular:
Online Erişim:Citation/Abstract
Full Text
Full Text - PDF
Etiketler: Etiketle
Etiket eklenmemiş, İlk siz ekleyin!

MARC

LEADER 00000nab a2200000uu 4500
001 3159547532
003 UK-CbPIL
022 |a 0178-2789 
022 |a 1432-2315 
024 7 |a 10.1007/s00371-024-03326-1  |2 doi 
035 |a 3159547532 
045 2 |b d20250101  |b d20250131 
245 1 |a PMGAN: pretrained model-based generative adversarial network for text-to-image generation 
260 |b Springer Nature B.V.  |c Jan 2025 
513 |a Journal Article 
520 3 |a Text-to-image generation is a challenging task. Although diffusion models can generate high-quality images of complex scenes, they sometimes suffer from a lack of realism. Additionally, there is often a large diversity among images generated from different text with the same semantics. Furthermore, the generation of details is sometimes insufficient. Generative adversarial networks can generate realism images. These images are consistent with the text descriptions. And the networks can generate content-consistent images. In this paper, we argue that generating images that are more consistent with the text descriptions is more important than generating higher-quality images. Therefore, this paper proposes the pretrained model-based generative adversarial network (PMGAN). PMGAN utilizes multiple pre-trained models in both generator and discriminator. Specifically, in the generator, the deep attentional multimodal similarity model text encoder extracts word and sentence embeddings from the input text, and the contrastive language-image pre-training (CLIP) text encoder extracts initial image features from the input text. In the discriminator, a pre-trained CLIP image encoder extracts image features from the input image. The CLIP encoder can map text and images into a common semantic space, which is beneficial to generate high-quality images. Experimental results show that compared to the state-of-the-art methods, PMGAN achieves better scores on both inception score and Fréchet inception distance and can produce higher quality images while maintaining greater consistency with text descriptions. 
653 |a Diffusion models 
653 |a Descriptions 
653 |a Semantics 
653 |a Methods 
653 |a Deep learning 
653 |a Image quality 
653 |a Image processing 
653 |a Coders 
653 |a Discriminators 
653 |a Natural language 
653 |a Generative adversarial networks 
773 0 |t The Visual Computer  |g vol. 41, no. 1 (Jan 2025), p. 303 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3159547532/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3159547532/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3159547532/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch