Indexing and querying of XML data using modified Prufer sequences M S thesis, by K Hima Prasad, July 2006. Holistic processing of twig queries on a given set of XML documents is considered as a core operation in XML database research. Sequence based XML query processing is an approach to process twig queries holistically. In this thesis, we propose a new a way of transforming an XML document into a sequence based on a variation of Prufer sequences, called Modified Prufer Sequence. Twig patterns can be found by converting both XML documents and twig queries into Modified Prufer Sequences and by performing a non-contiguous subsequence matching of the query sequence onto the data sequence. The sequence data is indexed using a two level B+-tree to support fast subsequence matching. We propose efficient algorithms for processing both ordered and un-ordered twig queries. Proposed twig matching algorithm generates results without any post processing and results are free of false positives and duplicates. We also present an optimization for parent-child axis along with other optimizations. Unlike other sequence based approaches, we show that our sequencing and indexing mechanism can be extended to support updates without regenerating the whole sequence every time an update occurs in an XML document. The main idea is to use monotonically increasing real numbers as position numbers in the sequence so that appropriate gaps can be left for future insertions. We propose algorithms for handling all kinds of updates, namely insert, delete, rename and replace. Our experimental results show that our system outperforms existing systems in terms of both CPU and I/O costs. |