Resolving Linguistic Asymmetry: Forging Symmetric Multilingual Embeddings Through Asymmetric Contrastive and Curriculum Learning
Guardado en:
| Publicado en: | Symmetry vol. 17, no. 9 (2025), p. 1386-1407 |
|---|---|
| Autor principal: | |
| Otros Autores: | , , |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | The pursuit of universal, symmetric semantic representations within large language models (LLMs) faces a fundamental challenge: the inherent asymmetry of natural languages. Different languages exhibit vast disparities in syntactic structures, lexical choices, and cultural nuances, making the creation of a truly shared, symmetric embedding space a non-trivial task. This paper aims to address this critical problem by introducing a novel framework to forge robust and symmetric multilingual sentence embeddings. Our approach, named DACL (Dynamic Asymmetric Contrastive Learning), is anchored in two powerful asymmetric learning paradigms: Contrastive Learning and Dynamic Curriculum Learning (DCL). We extend Contrastive Learning to the multilingual context, where it asymmetrically treats semantically equivalent sentences from different languages (positive pairs) and sentences with distinct meanings (negative pairs) to enforce semantic symmetry in the target embedding space. To further refine this process, we incorporate Dynamic Curriculum Learning, which introduces a second layer of asymmetry by dynamically scheduling training instances from easy to hard. This dual-asymmetric strategy enables the model to progressively master complex cross-lingual relationships, starting with more obvious semantic equivalences and advancing to subtler ones. Our comprehensive experiments on benchmark cross-lingual tasks, including sentence retrieval and cross-lingual classification (XNLI, PAWS-X, MLDoc, MARC), demonstrate that DACL significantly outperforms a wide range of established baselines. The results validate our dual-asymmetric framework as a highly effective approach for forging robust multilingual embeddings, particularly excelling in tasks involving complex linguistic asymmetries. Ultimately, this work contributes a novel dual-asymmetric learning framework that effectively leverages linguistic asymmetry to achieve robust semantic symmetry across languages. It offers valuable insights for developing more capable, fair, and interpretable multilingual LLMs, emphasizing that deliberately leveraging asymmetry in the learning process is a highly effective strategy. |
|---|---|
| ISSN: | 2073-8994 |
| DOI: | 10.3390/sym17091386 |
| Fuente: | Engineering Database |