Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

Salvato in:
Dettagli Bibliografici
Pubblicato in:arXiv.org (Jun 8, 2024), p. n/a
Autore principale: Munikoti, Sai
Altri autori: Stewart, Ian, Horawalavithana, Sameera, Kvinge, Henry, Emerson, Tegan, Thompson, Sandra E, Pazdernik, Karl
Pubblicazione:
Cornell University Library, arXiv.org
Soggetti:
Accesso online:Citation/Abstract
Full text outside of ProQuest
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3066583572
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3066583572 
045 0 |b d20240608 
100 1 |a Munikoti, Sai 
245 1 |a Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities 
260 |b Cornell University Library, arXiv.org  |c Jun 8, 2024 
513 |a Working Paper 
520 3 |a Multimodal models are expected to be a critical component to future advances in artificial intelligence. This field is starting to grow rapidly with a surge of new design elements motivated by the success of foundation models in natural language processing (NLP) and vision. It is widely hoped that further extending the foundation models to multiple modalities (e.g., text, image, video, sensor, time series, graph, etc.) will ultimately lead to generalist multimodal models, i.e. one model across different data modalities and tasks. However, there is little research that systematically analyzes recent multimodal models (particularly the ones that work beyond text and vision) with respect to the underling architecture proposed. Therefore, this work provides a fresh perspective on generalist multimodal models (GMMs) via a novel architecture and training configuration specific taxonomy. This includes factors such as Unifiability, Modularity, and Adaptability that are pertinent and essential to the wide adoption and application of GMMs. The review further highlights key challenges and prospects for the field and guide the researchers into the new advancements. 
653 |a Modularity 
653 |a Taxonomy 
653 |a Artificial intelligence 
653 |a Critical components 
653 |a Natural language processing 
700 1 |a Stewart, Ian 
700 1 |a Horawalavithana, Sameera 
700 1 |a Kvinge, Henry 
700 1 |a Emerson, Tegan 
700 1 |a Thompson, Sandra E 
700 1 |a Pazdernik, Karl 
773 0 |t arXiv.org  |g (Jun 8, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3066583572/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2406.05496