Heterogeneous Web Data Extraction Algorithm Based On Modified Hidden Conditional Random Fields

Guardado en:
Detalles Bibliográficos
Publicado en:Journal of Networks vol. 9, no. 4 (Apr 2014), p. 993-999
Autor principal: Cui, Cheng
Publicado:
Academy Publisher
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:  In this paper, the author proposes a novel heterogeneous Web data extraction algorithm using a modified hidden conditional random fields model. Considering the traditional linear chain based conditional random fields can not effectively solve the problem of complex and heterogeneous Web data extraction, the author modifies the standard hidden conditional random fields in three aspects, which are using the hidden Markov model to calculate the hidden variables and modifying the standard hidden conditional random fields through two stages. In the first stage, each training data sequence is learned using hidden Markov model, and then implicit variables can be visible. In the second stage, parameters can be learned for a given sequence. Finally, experiments are conducted to make performance evaluation on two standard datasets -- "EData dataset" and "Research Papers dataset". Compared with the existing Web data extraction methods, it can be seen that the proposed algorithm can extract useful information from heterogeneous Web data effectively and efficiently.
ISSN:1796-2056
Fuente:Advanced Technologies & Aerospace Database