HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic Text Classification

Uloženo v:
Podrobná bibliografie
Vydáno v:ISPRS International Journal of Geo-Information vol. 14, no. 7 (2025), p. 268-286
Hlavní autor: Chen Zugang
Další autoři: Zhao, Le
Vydáno:
MDPI AG
Témata:
On-line přístup:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3233222677
003 UK-CbPIL
022 |a 2220-9964 
024 7 |a 10.3390/ijgi14070268  |2 doi 
035 |a 3233222677 
045 2 |b d20250101  |b d20251231 
084 |a 231472  |2 nlm 
100 1 |a Chen Zugang  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; cszl@gs.zzu.edu.cn 
245 1 |a HierLabelNet: A Two-Stage LLMs Framework with Data Augmentation and Label Selection for Geographic Text Classification 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Earth observation data serve as a fundamental resource in Earth system science. The rapid advancement of remote sensing and in situ measurement technologies has led to the generation of massive volumes of data, accompanied by a growing body of geographic textual information. Efficient and accurate classification and management of these geographic texts has become a critical challenge in the field. However, the effectiveness of traditional classification approaches is hindered by several issues, including data sparsity, class imbalance, semantic ambiguity, and the prevalence of domain-specific terminology. To address these limitations and enable the intelligent management of geographic information, this study proposes an efficient geographic text classification framework based on large language models (LLMs), tailored to the unique semantic and structural characteristics of geographic data. Specifically, LLM-based data augmentation strategies are employed to mitigate the scarcity of labeled data and class imbalance. A semantic vector database is utilized to filter the label space prior to inference, enhancing the model’s adaptability to diverse geographic terms. Furthermore, few-shot prompt learning guides LLMs in understanding domain-specific language, while an output alignment mechanism improves classification stability for complex descriptions. This approach offers a scalable solution for the automated semantic classification of geographic text for unlocking the potential of ever-expanding geospatial big data, thereby advancing intelligent information processing and knowledge discovery in the geospatial domain. 
653 |a Sparsity 
653 |a Text categorization 
653 |a Accuracy 
653 |a Labels 
653 |a Data processing 
653 |a Datasets 
653 |a Classification 
653 |a Big Data 
653 |a Domain specific languages 
653 |a Remote sensing 
653 |a In situ measurement 
653 |a Large language models 
653 |a Terminology 
653 |a Data augmentation 
653 |a Semantics 
653 |a Information processing 
653 |a Natural language processing 
653 |a Annotations 
653 |a Information retrieval 
700 1 |a Zhao, Le  |u Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; cszl@gs.zzu.edu.cn 
773 0 |t ISPRS International Journal of Geo-Information  |g vol. 14, no. 7 (2025), p. 268-286 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3233222677/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3233222677/fulltextwithgraphics/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3233222677/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch