Efficient binary structural join algorithms on XML data M S Thesis, by G V Subramanyam, July 2005. XML is the widely used format for representing and exchanging semi-structured data on the web. XML queries naturally specify hierarchical patterns to select relevant parts of an XML document. These patterns have a sequence of selection predicates connected by operators representing structural relationships (either ancestor-descendant or preceding-following sibling). In this context, the operation of structural join involves discovering pairs of nodes that are structurally related from the cross product of two sets of nodes. Current XPath processing algorithms focus more on solving queries containing ancestor-descendant relationship and pay little attention to the equally important preceding-following sibling relationship. In this thesis, we present novel solutions for solving queries containing structural join operators that specify ancestor-descendant or sibling axes. For solving ancestor-descendant relationship, we present a solution that involves maintaining an extra piece of information, called nDesc, for each element in the XML document tree. The parameter nDesc specifies the number of descendants having same tag name as that of the element. Using nDesc, one can effectively skip most of the ancestors and descendants that do not participate in the join. We present anti-structural join, which is new to the database literature, for processing queries that specify don't-occur patterns. These patterns are specified using not boolean function in XPath. We provide approaches for constructing XML subtrees of the final output nodes that are obtained when a query is evaluated. We also present a linear time algorithm for processing queries containing sibling axes by using sibling-list at every level of the XML document tree. A sibling-list, at any instance, holds set of elements that are under common parent. During the join, we claim that each element in the element lists is visited once. An extensive experimental evaluation shows that our join algorithms perform significantly better than the currently existing join algorithms. |