Extended join algorithms and path based approach for XPATH evaluation,

 

M S thesis, by M Archana, March 2006.

 

XML(eXtensible Markup Language) is gaining prevalence in the representation and

exchange of web data. XML document can be modeled as a tree and the queries specify

selection predicates on this tree structure. XPath queries can specify axis such as

parent-child(P-C), ancestor-descendant(A-D) and following-preceding(F-P). Efficient

evaluation of XPath queries is a major research issue. Structural join techniques are

well known for query evaluation. Stack-tree algorithms are structural join algorithms

which advocate the use of A-D algorithms to evaluate the P-C axis also. Though

P-C axis generates less number of results than A-D axis, this approach needs same

processing overhead to evaluate both the axes. To reduce the computation overhead

for evaluating P-C axis, we propose algorithms exclusively for P-C axis. These join

algorithms use interval encoding to uniquely identify the nodes.

Current algorithms concentrate mainly on A-D and P-C axis but pay little attention

towards evaluating F-P axis. But F-P axis is also useful in some queries. In the

framework of interval encoding, algorithms for all the axes are present except for the

F-P axis in the literature. To fiill this gap, we also propose structural join algorithms

to evaluate F-P axis. The proposed algorithms have linear time complexity.

The join techniques for XPath evaluation are costly as they need large number

of disk accesses. Path based techniques can reduce the disk I/O cost by reducing the

number of joins. Existing path based approaches store pathIDs along with interval

encoding labels for all the elements present in the document. These approaches

have di_culties in assigning pathIDs to the documents which are deep and have large

number of distinct tags and they require reassigning of the labels in case of updates to

the document. To overcome these drawbacks we use structural summary techniques to

assign pathIDs. We propose novel metadata guided query evaluation scheme which

uses string matching technique to solve the queries. Our experimental evaluation

shows that the proposed algorithms perform better than the existing algorithms.