Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org (Dec 18, 2024), p. n/a
Hlavní autor: Han, Seungwook
Další autoři: Song, Jinyeop, Gore, Jeff, Agrawal, Pulkit
Vydáno:
Cornell University Library, arXiv.org
Témata:
On-line přístup:Citation/Abstract
Full text outside of ProQuest
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3147264514
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3147264514 
045 0 |b d20241218 
100 1 |a Han, Seungwook 
245 1 |a Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers 
260 |b Cornell University Library, arXiv.org  |c Dec 18, 2024 
513 |a Working Paper 
520 3 |a Humans distill complex experiences into fundamental abstractions that enable rapid learning and adaptation. Similarly, autoregressive transformers exhibit adaptive learning through in-context learning (ICL), which begs the question of how. In this paper, we propose concept encoding-decoding mechanism to explain ICL by studying how transformers form and use internal abstractions in their representations. On synthetic ICL tasks, we analyze the training dynamics of a small transformer and report the coupled emergence of concept encoding and decoding. As the model learns to encode different latent concepts (e.g., ``Finding the first noun in a sentence.") into distinct, separable representations, it concureently builds conditional decoding algorithms and improve its ICL performance. We validate the existence of this mechanism across pretrained models of varying scales (Gemma-2 2B/9B/27B, Llama-3.1 8B/70B). Further, through mechanistic interventions and controlled finetuning, we demonstrate that the quality of concept encoding is causally related and predictive of ICL performance. Our empirical insights shed light into better understanding the success and failure modes of large language models via their representations. 
653 |a Predictive control 
653 |a Algorithms 
653 |a Encoding-Decoding 
653 |a Large language models 
653 |a Context 
653 |a Failure modes 
653 |a Task complexity 
653 |a Representations 
653 |a Adaptive learning 
700 1 |a Song, Jinyeop 
700 1 |a Gore, Jeff 
700 1 |a Agrawal, Pulkit 
773 0 |t arXiv.org  |g (Dec 18, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3147264514/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.12276