Heterogeneous Web Data Extraction Algorithm Based On Modified Hidden Conditional Random Fields
Guardado en:
| Publicado en: | Journal of Networks vol. 9, no. 4 (Apr 2014), p. 993-999 |
|---|---|
| Autor principal: | |
| Publicado: |
Academy Publisher
|
| Acceso en línea: | Citation/Abstract Full Text Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | In this paper, the author proposes a novel heterogeneous Web data extraction algorithm using a modified hidden conditional random fields model. Considering the traditional linear chain based conditional random fields can not effectively solve the problem of complex and heterogeneous Web data extraction, the author modifies the standard hidden conditional random fields in three aspects, which are using the hidden Markov model to calculate the hidden variables and modifying the standard hidden conditional random fields through two stages. In the first stage, each training data sequence is learned using hidden Markov model, and then implicit variables can be visible. In the second stage, parameters can be learned for a given sequence. Finally, experiments are conducted to make performance evaluation on two standard datasets -- "EData dataset" and "Research Papers dataset". Compared with the existing Web data extraction methods, it can be seen that the proposed algorithm can extract useful information from heterogeneous Web data effectively and efficiently. |
|---|---|
| ISSN: | 1796-2056 |
| Fuente: | Advanced Technologies & Aerospace Database |