Transformer-Based Abstractive Summarization of Legal Texts in Low-Resource Languages

Bewaard in:
Bibliografische gegevens
Gepubliceerd in:Electronics vol. 14, no. 12 (2025), p. 2320-2341
Hoofdauteur: Masih Salman
Andere auteurs: Hassan Mehdi, Gillani, Fahad Labiba, Hassan Bilal
Gepubliceerd in:
MDPI AG
Onderwerpen:
Online toegang:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Voeg label toe
Geen labels, Wees de eerste die dit record labelt!

MARC

LEADER 00000nab a2200000uu 4500
001 3223907975
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14122320  |2 doi 
035 |a 3223907975 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Masih Salman  |u Department of Computer Science, Faculty of Computing and Artificial Intelligence (FCAI), Air University Sector E-9, Islamabad 44000, Pakistanmehdi.hassan@au.edu.pk (M.H.) 
245 1 |a Transformer-Based Abstractive Summarization of Legal Texts in Low-Resource Languages 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a The emergence of large language models (LLMs) has revolutionized the trajectory of NLP research. Transformers, combined with attention mechanisms, have increased computational power, and massive datasets have led to the emergence of pre-trained large language models (PLLMs), which offer promising possibilities for multilingual applications in low-resource settings. However, the scarcity of annotated resources and suitably pre-trained models continues to pose a significant hurdle for the low-resource abstractive text summarization of legal texts, particularly in Urdu. This study presents a transfer learning approach using pre-trained multilingual large models (the mBART and mT5, Small, Base, and Large) to generate abstractive summaries of Urdu legal texts. A curated dataset was developed with legal experts, who produced ground-truth summaries. The models were fine-tuned on this domain-specific corpus to adapt them for low-resource legal summarization. The experimental results demonstrated that the mT5-Large, fine-tuned on Urdu legal texts, outperforms all other evaluated models across standard summarization metrics, achieving ROUGE-1 scores of 0.7889, ROUGE-2 scores of 0.5961, and ROUGE-L scores of 0.7813. This indicates its strong capacity to generate fluent, coherent, and legally accurate summaries. The mT5-Base model closely follows with ROUGE-1 = 0.7774, while the mT5-Small shows moderate performance (ROUGE-1 = 0.6406), with reduced fidelity in capturing legal structure. The mBART50 model, despite being fine-tuned on the same legal corpus, performs lower (ROUGE-1 = 0.5914), revealing its relative limitations in this domain. Notably, models trained or fine-tuned on non-legal, out-of-domain data, such as the urT5 (ROUGE-1 = 0.3912), the mT5-XLSUM (ROUGE-1 = 0.0582), and the mBART50 (XLSUM) (ROUGE-1 = 0.0545), exhibit poor generalization to legal summaries, underscoring the necessity of domain adaptation when working in low-resource legal contexts. These findings highlight the effectiveness of fine-tuning multilingual LLMs for domain-specific tasks. The gains in legal summarization demonstrate the practical value of transfer learning in low-resource settings and the broader potential of AI-driven tools for legal document processing, information retrieval, and decision support. 
653 |a Language 
653 |a Datasets 
653 |a Data processing 
653 |a Large language models 
653 |a Summaries 
653 |a Legal documents 
653 |a Texts 
653 |a Information retrieval 
653 |a Massive data points 
653 |a Document management 
653 |a Natural language processing 
653 |a Multilingualism 
653 |a Machine learning 
653 |a Annotations 
653 |a Statistical methods 
653 |a Semantics 
700 1 |a Hassan Mehdi  |u Department of Computer Science, Faculty of Computing and Artificial Intelligence (FCAI), Air University Sector E-9, Islamabad 44000, Pakistanmehdi.hassan@au.edu.pk (M.H.) 
700 1 |a Gillani, Fahad Labiba  |u Department of Computer Science, National University of Computer & Emerging Sciences, Islamabad 44000, Pakistan; labiba.fahad@nu.edu.pk 
700 1 |a Hassan Bilal  |u Faculty of Engineering & Environment, Northumbria University, London Campus, London E1 7HT, UK 
773 0 |t Electronics  |g vol. 14, no. 12 (2025), p. 2320-2341 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3223907975/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3223907975/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3223907975/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch