Indexing and querying of XML data using modified Prufer sequences

 

M S thesis, by K Hima Prasad, July 2006.

 

Holistic processing of twig queries on a given set

of XML documents is considered as a core operation in XML

database research. Sequence based XML query processing is an

approach to process twig queries holistically. In this thesis, we

propose a new a way of transforming an XML document into a

sequence based on a variation of Prufer sequences, called Modified

Prufer Sequence. Twig patterns can be found by converting

both XML documents and twig queries into Modified Prufer

Sequences and by performing a non-contiguous subsequence

matching of the query sequence onto the data sequence. The

sequence data is indexed using a two level B+-tree to support

fast subsequence matching. We propose efficient algorithms for

processing both ordered and un-ordered twig queries. Proposed

twig matching algorithm generates results without any post processing

and results are free of false positives and duplicates.

We also present an optimization for parent-child axis along with

other optimizations.

 

Unlike other sequence based approaches, we show that our

sequencing and indexing mechanism can be extended to support

updates without regenerating the whole sequence every time an

update occurs in an XML document. The main idea is to use

monotonically increasing real numbers as position numbers in the

sequence so that appropriate gaps can be left for future insertions.

We propose algorithms for handling all kinds of updates, namely

insert, delete, rename and replace. Our experimental results show

that our system outperforms existing systems in terms of both

CPU and I/O costs.