Semantic Annotation Model and Method Based on Internet Open Dataset

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Journal of Intelligent Information Technologies vol. 21, no. 1 (2025), p. 1-20
Hlavní autor:	Gao, Xin
Další autoři:	Wang, Yansong, Wang, Fang, Zhang, Baoqun, Hu, Caie, Wang, Jian, Ma, Longfei
Vydáno:	IGI Global
Témata:	Accuracy Internet Datasets Ontology Information retrieval Data mining Context Supervised learning Organization theory Labeling Data processing Data analysis Semantic web Annotations Information sharing Efficiency Speech Semantics Decision making Electric power Information systems Natural language processing Methods Resource Description Framework-RDF Information technology Cultural heritage
On-line přístup:	Citation/Abstract Full Text - PDF
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Abstrakt:	Traditional semantic annotation faces the problem of dataset diversity. Different fields and scenarios need to be specially annotated, and annotation work usually requires a lot of manpower and time investment. To meet these challenges, this paper deeply studies the semantic annotation model and method based on internet open datasets, aiming to improve annotation efficiency and accuracy and promote data resource sharing and utilization. This paper selects Common Crawl dataset to provide sufficient training samples; methods such as removing stop words and deduplication are used to preprocess data to improve data quality; a keyword extraction model based on heuristic rules and text context is constructed. In terms of semantic annotation model, this paper constructs a model based on Bidirectional Long Short-Term Memory (BiLSTM), which can make full use of the part-of-speech information of the corpus context, capture the part-of-speech features of the corpus, and generate semantic tags through supervised learning.
ISSN:	1548-3657 1548-3665
DOI:	10.4018/IJIIT.370966
Zdroj:	Engineering Database