Parallel XML and XPath Parsing

Guardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2018)
Autor principal:	Zhang, Ying
Publicado:	ProQuest Dissertations & Theses
Materias:	Computer science
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC


LEADER	00000nab a2200000uu 4500
001	2039675307
003	UK-CbPIL
020			\|a 978-0-355-90697-4
035			\|a 2039675307
045	0		\|b d20180101
084			\|a 66569 \|2 nlm
100	1		\|a Zhang, Ying
245	1		\|a Parallel XML and XPath Parsing
260			\|b ProQuest Dissertations & Theses \|c 2018
513			\|a Dissertation/Thesis
520	3		\|a XML has been widely adopted across a spectrum of applications. Its parsing efficiency, however, remains a concern and can be a bottleneck. XPath is a query language used to locate and select content in an XML document. Improving the performance of XPath processing is thus important for many applications. With the prevalence of multicore CPUs, parallelization to improve performance is one promising approach. This dissertation investigates the parallelization approaches of DOM-style XML parsing. We first figured out an overall solution to decomposing the XML document into well-formed fragments at well-defined points according to the output of an initial preparsing phase. Then, we focused on the parallelization of the preparsing stage, which is the major bottleneck. Based on earlier research, we extend our work by examining how speculation can be used to improve performance, using an approach we called a p-DFA, not computing low-probability possibilities. Effectively parallelizing XPath is challenging. For a large number of XPath queries, it is hard to evenly divide them into different processors. However, there are opportunities. First, many queries focus on different location steps, so they can be processed in different processors. Second, it is possible for the free processors to steal jobs from busy ones. The problem is how to maintain the query to be consecutive if it has already executed some location steps. We investigated the use of an approach that builds on YFilter, then divided the NFA into several smaller ones for concurrent processing. We implemented and tested two strategies for load balancing: static approach and dynamic approach with work stealing. Another research is investigated parallel parsing XPath based on TwigM which focusing on streaming data. According to the state machine created in advance as stated in TwigM algorithm, we created all the needed information from the partial received data. Then discussed how to divide tasks on the fly in two steps, first step is to parse XML and at the same time create tasks, second step is to assign parsed XPath tasks to multiple threads and finally merge the result. The experiments for the above approaches show good speedup and scalability.
653			\|a Computer science
773	0		\|t ProQuest Dissertations and Theses \|g (2018)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/2039675307/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/2039675307/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch