Altogether: Image Captioning via Re-aligning Alt-text

Guardat en:

Dades bibliogràfiques
Publicat a:	arXiv.org (Dec 12, 2024), p. n/a
Autor principal:	Hu, Xu
Altres autors:	Po-Yao, Huang, Tan, Xiaoqing Ellen, Ching-Feng Yeh, Kahn, Jacob, Jou, Christine, Ghosh, Gargi, Levy, Omer, Zettlemoyer, Luke, Wen-tau Yih, Shang-Wen, Li, Xie, Saining, Feichtenhofer, Christoph
Publicat:	Cornell University Library, arXiv.org
Matèries:	Image annotation Image classification Visual tasks Annotations Image quality Texts Image processing Human performance Synthetic data
Accés en línia:	Citation/Abstract Full text outside of ProQuest
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Descripció
Resum:	This paper focuses on creating synthetic data to improve the quality of image captions. Existing works typically have two shortcomings. First, they caption images from scratch, ignoring existing alt-text metadata, and second, lack transparency if the captioners' training data (e.g. GPT) is unknown. In this paper, we study a principled approach Altogether based on the key idea to edit and re-align existing alt-texts associated with the images. To generate training data, we perform human annotation where annotators start with the existing alt-text and re-align it to the image content in multiple rounds, consequently constructing captions with rich visual concepts. This differs from prior work that carries out human annotation as a one-time description task solely based on images and annotator knowledge. We train a captioner on this data that generalizes the process of re-aligning alt-texts at scale. Our results show our Altogether approach leads to richer image captions that also improve text-to-image generation and zero-shot image classification tasks.
ISSN:	2331-8422
Font:	Engineering Database