XPath~the XML Path Language


 

XPath, the XML Path Language, is a query language for selecting nodes from an XML document.

In Simple Language, XPath is the solution to find information in an XML document. XPath uses expressions to find elements, attributes, and other information in your XML. If you have an XML document that contained a bunch of your favorite books, each with author children elements, you can use a one line XPath expression to find all the authors of your favorite books!

The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria.

XQuery XPointer XLink XPath

XQuery XPointer XLink XPath

Originally motivated by a desire to provide a common syntax and behavior model between XPointer and XSLT, subsets of the XPath query language are used in other W3C specifications such as XML Schema, XForms and the Internationalization Tag Set (ITS).

XPath has been adapted to a number of XML processing libraries and tools, many of which also offer CSS Selectors, another W3C standard, as a simpler alternative to XPath.

XBRL consumer API uses XPath internally to extract financial information from the XBRL document set.

XPath Terminology

Nodes

In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document nodes.

XML documents are treated as trees of nodes. The topmost element of the tree is called the root element.

Look at the following XML document:

<!–?xml version=”1.0″ encoding=”–>ISO-8859-1“?><bookstore>
<book>
Harry Potter” rel=”wikipedia”>Harry Potter
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>

Example of nodes in the XML document above:

<bookstore> (root element node)<author>J K. Rowling</author> (element node)lang=”en” (attribute node)

Atomic values

Atomic values are nodes with no children or parent.

Example of atomic values:

J K. Rowling”en”

Items

Items are atomic values or nodes.


Relationship of Nodes

Parent

Each element and attribute has one parent.

In the following example; the book element is the parent of the title, author, year, and price:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

Children

Element nodes may have zero, one or more children.

In the following example; the title, author, year, and price elements are all children of the book element:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

Siblings

Nodes that have the same parent.

In the following example; the title, author, year, and price elements are all siblings:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

Ancestors

A node’s parent, parent’s parent, etc.

In the following example; the ancestors of the title element are the book element and the bookstore element:

<bookstore><book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book></bookstore>

Descendants

A node’s children, children’s children, etc.

In the following example; descendants of the bookstore element are the book, title, author, year, and price elements:

<bookstore><book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book></bookstore>

XPath uses path expressions to select nodes or node-sets in an XML document. The node is selected by following a path or steps.


The XML Example Document

We will use the following XML document in the examples below.

<!–?xmlversion=”1.0″ encoding=”ISO-8859-1″?>–><bookstore><book>
<title lang=”eng”>Harry Potter</title>
<price>29.99</price>
</book>

<book>
<title lang=”eng”>Learning XML</title>
<price>39.95</price>
</book>

</bookstore>


Selecting Nodes

XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below:

Expression Description
nodename Selects all child nodes of the named node
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes

In the table below we have listed some path expressions and the result of the expressions:

Path Expression Result
bookstore Selects all the child nodes of the bookstore element
/bookstore Selects the root element bookstoreNote: If the path starts with a slash ( / ) it always represents an absolute path to an element!
bookstore/book Selects all book elements that are children of bookstore
//book Selects all book elements no matter where they are in the document
bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element
//@lang Selects all attributes that are named lang

Predicates

Predicates are used to find a specific node or a node that contains a specific value.

Predicates are always embedded in square brackets.

In the table below we have listed some path expressions with predicates and the result of the expressions:

Path Expression Result
/bookstore/book[1] Selects the first book element that is the child of the bookstore element.Note: IE5 and later has implemented that [0] should be the first node, but according to the W3C standard it should have been [1]!!
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
//title[@lang] Selects all the title elements that have an attribute named lang
//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of ‘eng’
/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

Selecting Unknown Nodes

XPath wildcards can be used to select unknown XML elements.

Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind

In the table below we have listed some path expressions and the result of the expressions:

Path Expression Result
/bookstore/* Selects all the child nodes of the bookstore element
//* Selects all elements in the document
//title[@*] Selects all title elements which have any attribute

Selecting Several Paths

By using the | operator in an XPath expression you can select several paths.

In the table below we have listed some path expressions and the result of the expressions:

Path Expression Result
//book/title | //book/price Selects all the title AND price elements of all book elements
//title | //price Selects all the title AND price elements in the document
/bookstore/book/title | //price Selects all the title elements of the book element of the bookstore element AND all the price elements in the document

Reference:

http://en.wikipedia.org/wiki/XPath

http://www.tizag.com/xmlTutorial/xpathtutorial.php

 

About these ads