Altogether: Image Captioning via Re-aligning Alt-text

Guardado en:
Detalles Bibliográficos
Publicado en:arXiv.org (Dec 12, 2024), p. n/a
Autor principal: Hu, Xu
Otros Autores: Po-Yao, Huang, Tan, Xiaoqing Ellen, Ching-Feng Yeh, Kahn, Jacob, Jou, Christine, Ghosh, Gargi, Levy, Omer, Zettlemoyer, Luke, Wen-tau Yih, Shang-Wen, Li, Xie, Saining, Feichtenhofer, Christoph
Publicado:
Cornell University Library, arXiv.org
Materias:
Acceso en línea:Citation/Abstract
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3119817418
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3119817418 
045 0 |b d20241212 
100 1 |a Hu, Xu 
245 1 |a Altogether: Image Captioning via Re-aligning Alt-text 
260 |b Cornell University Library, arXiv.org  |c Dec 12, 2024 
513 |a Working Paper 
520 3 |a This paper focuses on creating synthetic data to improve the quality of image captions. Existing works typically have two shortcomings. First, they caption images from scratch, ignoring existing alt-text metadata, and second, lack transparency if the captioners' training data (e.g. GPT) is unknown. In this paper, we study a principled approach Altogether based on the key idea to edit and re-align existing alt-texts associated with the images. To generate training data, we perform human annotation where annotators start with the existing alt-text and re-align it to the image content in multiple rounds, consequently constructing captions with rich visual concepts. This differs from prior work that carries out human annotation as a one-time description task solely based on images and annotator knowledge. We train a captioner on this data that generalizes the process of re-aligning alt-texts at scale. Our results show our Altogether approach leads to richer image captions that also improve text-to-image generation and zero-shot image classification tasks. 
653 |a Image annotation 
653 |a Image classification 
653 |a Visual tasks 
653 |a Annotations 
653 |a Image quality 
653 |a Texts 
653 |a Image processing 
653 |a Human performance 
653 |a Synthetic data 
700 1 |a Po-Yao, Huang 
700 1 |a Tan, Xiaoqing Ellen 
700 1 |a Ching-Feng Yeh 
700 1 |a Kahn, Jacob 
700 1 |a Jou, Christine 
700 1 |a Ghosh, Gargi 
700 1 |a Levy, Omer 
700 1 |a Zettlemoyer, Luke 
700 1 |a Wen-tau Yih 
700 1 |a Shang-Wen, Li 
700 1 |a Xie, Saining 
700 1 |a Feichtenhofer, Christoph 
773 0 |t arXiv.org  |g (Dec 12, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3119817418/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2410.17251