- XPath 1.0
XPath (XML Path Language) is a language for selecting nodes from an
XML document. In addition, XPath may be used to compute values (strings, numbers, or boolean values) from the content of an XML document. The current version of the language is XPath 2.0, but because version 1.0 is still the more widely-used version, this article describes XPath 1.0.The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as "an XPath".
Originally motivated by a desire to provide a common syntax and behavior model between
XPointer andXSLT , subsets of the XPathquery language are used in otherW3C specifications such as XML Schema andXForms .Syntax and Semantics
The most important kind of expression in XPath is a "location path". A location path consists of a sequence of "location steps". Each location step has three components:
* an "axis"
* a "node test"
* and a "predicate".An XPath expression is evaluated with respect to a "context node". An Axis Specifier such as 'child' or 'descendant' specifies the direction to navigate from the context node. The node test and the predicate are used to filter the nodes specified by the axis specifier: For example the node test 'A' requires that all nodes navigated to must have label 'A'. A predicate can be used to specify that the selected nodes have certain properties, which are specified by XPath expressions themselves.
Two notations are defined; the first, known as abbreviated syntax, is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The full syntax is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.
Abbreviated syntax
The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at least
the simplest XPath takes a form such as<A> <B> <C/> </B></A>
*/A/B/C
which selects C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. XPath syntax is designed to mimic URI (Uniform Resource Identifier ) syntax and file path syntax.More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
*A//B/* [1]
selects the first element ('[1]
'), whatever its name ('*
'), that is a child ('/
') of a B element that itself is a child or other, deeper descendant ('//
') of an A element that is a child of the current context node (the expression does not begin with a '/
'). If there are several suitable B elements in the document, this actually returns a set of all their first children.Expanded syntax
In the full, unabbreviated syntax, the two examples above would be written
*/child::A/child::B/child::C
*child::A/descendant-or-self::node()/child::B/child::* [position()=1]
Here, in each step of the XPath, the axis (e.g.
child
ordescendant-or-self
) is explicitly specified, followed by::
and then the node test, such asA
ornode()
in the examples aboveAxis specifiers
The Axis Specifier indicates navigation direction within the tree representation of the XML document. The axes available are:
As an example of using the attribute axis in abbreviated syntax,
//a/@href
selects the attribute calledhref
ina
elements anywhere in the document tree.The expression . (an abbreviation for self::node()) is most commonly used within a predicate to refer to the currently selected node.For example,h3 [.='See also']
selects an element calledh3
in the current context, whose text content isSee also
.Node tests
Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefix
gs
has been defined,//gs:enquiry
will find all theenquiry
elements in that namespace, and//gs:*
will find all elements, regardless of local name, in that namespace.Other node test formats are:;comment() :finds an XML comment node, e.g.
;text() :finds a node of type text, e.g. thehello
in
;processing-instruction() :finds XML processing instructions such ashello all . In this case,
processing-instruction('php')
would match.;node() :finds any node at all.Predicates
Expressions of any complexity can be specified in square brackets, that must be satisfied before the preceding node will be matched by an XPath. For example
//a [@href='help.php']
, which will match ana
element with anhref
attribute whose value ishelp.php
.There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context. All predicates must be satisfied for a match to occur.
When
//a [/html/@lang='en'] [@href='help.php'] [1] /@target
is applied to aXHTML document, it selects the value of thetarget
attribute of the firsta
element that has itshref
attribute set tohelp.php
, provided the document'shtml
top-level element also has alang
attribute set toen
. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.Predicate order is significant, however. Each predicate 'filters' a location step's selected node-set in turn.
//a [1] [/html/@lang='en'] [@href='help.php'] /@target
will find a match only if the "first"a
element in a@lang='en'
document also meets@href='help.php'
The above uses //a [1] incorrectly where it should use (//a) [1] . See talk page.
Functions and operators
XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and booleans.
The available operators are:
* The "/", "//" and " [...] " operators, used in path expressions, as described above.
* A union operator, "|", which forms the union of two node-sets.
* Boolean operators "and" and "or", and a function "not()"
* Arithmetic operators "+", "-", "*", "div" (divide), and "mod"
* Comparison operators "=", "!=", "<", ">", "<=", ">="The function library includes:
* Functions to manipulate strings: concat(), substring(), contains(), substring-before(), substring-after(), translate(), normalize-space(), string-length()
* Functions to manipulate numbers: sum(), round(), floor(), ceiling()
* Functions to get properties of nodes: name(), local-name(), namespace-uri()
* Functions to get information about the processing context: position(), last()
* Type conversion functions: string(), number(), boolean()Some of the more commonly useful functions are detailed below. For a complete description, see [http://www.w3.org/TR/xpath the W3C Recommendation document]
Node set functions
;position() :returns a number representing the position of this node in the sequence of nodes currently being processed (for example, the nodes selected by an xsl:for-each instruction in XSLT).;count("node-set") :returns the number of nodes in the node-set supplied as its argument.
String functions
;string("object"?) :converts any of the four XPath data types into a string according to built-in rules. If the value of the argument is a node-set, the function returns the string-value of the first node in document order, ignoring any further nodes.;concat("string", "string", "string"*) :concatenates two or more strings ;contains("s1", "s2") :returns
true
ifs1
containss2
;normalize-space("string"?) :all leading and trailing whitespace is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have beenprettyprint formatted, which could make further string processing unreliable.Boolean functions
;not("boolean") :negates any boolean expression.
;true() :evaluates to "true".
;false() :evaluates to "false".
Number functions
;sum("node-set") :converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.
Usage examples
Expressions can be created inside predicates using the operators:
and=, !=, <=, <, >=
. Boolean expressions may be combined with brackets> ()
and the boolean operatorsand
andor
as well as thenot()
function described above. Numeric calculations can use
and*, +, -, div
. Strings can consist of anymod Unicode characters.//item [@price > 2*@discount]
selects items whose price attribute is greater than twice the numeric value of their discount attribute.Entire node-sets can be combined ('unioned') using the pipe character |. Node sets that meet one or more of several conditions can be found by combining the conditions inside a predicate with '
or
'.v [x or y] | w [z]
will return a single node-set consisting of all thev
elements that havex
ory
child-elements, as well as all thew
elements that havez
child-elements, that were found in the current context.Examples
Given a sample XML document
en.wikipedia.org de.wikipedia.org fr.wikipedia.org pl.wikipedia.org es.wikipedia.org en.wiktionary.org fr.wiktionary.org vi.wiktionary.org tr.wiktionary.org es.wiktionary.org The XPath expression
Selects name attributes for all projects, and/wikimedia/projects/project/@name
Selects all editions of all projects, and/wikimedia//editions
Selects addresses of all English Wikimedia projects (text of all edition elements where language attribute is equal to "English"), and the following/wikimedia/projects/project/editions/edition [@language="English"] /text()
Selects addresses of all Wikipedias (text of all edition elements that exist under project element with a name attribute of "Wikipedia")/wikimedia/projects/project [@name="Wikipedia"] /editions/edition/text()Implementations
; Command Line Tools
*XMLStarlet ;
ActionScript
*XPath4AS/XPath4AS2 [http://www.xfactorstudio.com/]; C/
C++
*libxml2
* [http://reasoning.info Reason - C++ Library]
* [http://tinyxpath.sourceforge.net/ TinyXpath]
* [http://xml.apache.org/xalan-c/ Apache Xalan-C++]
* [http://software.decisionsoft.com/pathanIntro.html Pathan]
* [http://xqilla.sourceforge.net XQilla] is anXQuery and XPath 2.0 Open Source library, implemented on top of theXerces -C library
* Sedna XML Database
*VTD-XML ;
Delphi
* [http://sourceforge.net/projects/tpxmlpartner/ TurboPower XML Partner] (now "legally" released free-of-charge on SourceForge by its original, now-defunct developer); Implementations for Database Engines
* OpenLink Virtuoso; Java
* [http://jaxen.codehaus.org/ Jaxen] is an Open Source XPath implementation supporting (embedded by) multiple XML parsers (XOM, Dom4J, JDom).
* [http://xml.apache.org/xalan-j/ Apache Xalan-Java] supports XPath 1.0 (as well as XSLT 1.0)
* [http://saxon.sf.net/ Saxon] supports XPath 1.0 and XPath 2.0 (as well as XSLT 1.0, XSLT 2.0, and XQuery 1.0)
*VTD-XML [http://vtd-xml.sourceforge.net]
* Sedna XML Database Both XML:DB and proprietary.The Java package Javadoc:SE|package=javax.xml.xpath|javax/xml/xpath has been part of Java standard edition since Java 5. Technically this is an XPath
API rather than an XPath implementation, and it allows the programmer the ability to select a specific implementation that conforms to the interface.;JavaScript
* [http://goog-ajaxslt.sourceforge.net/ Google AJAXSLT]
* [http://coderepos.org/share/wiki/JavaScript-XPath JavaScript-XPath]
* [http://www.llamalab.com/js/xpath/ LlamaLab XPath.js]
* [http://www.jquery.com/ JQuery] (Basic support);.NET Framework
* In the System.Xml and System.Xml.XPath namespaces [http://msdn2.microsoft.com/en-us/library/system.xml.aspx]
* Sedna XML Database;Perl
* [http://search.cpan.org/dist/XML-LibXML/LibXML.pod XML::LibXML] (based upon libxml2)
* [http://search.cpan.org/dist/XML-XPath/XPath.pm XML::XPath]
* [http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath/ HTML::TreeBuilder::XPath] ( Parse Non-strict HTML with XPath );PHP
* [http://us.php.net/manual/en/class.domxpath.php DOMXPath]
* Sedna XML Database; Python
* [http://pyxml.sourceforge.net/ PyXML]
* [http://xmlsoft.org/python.html libxml2]
* [http://codespeak.net/lxml/ lxml] (based upon libxml2, aims to be more pythonic than default libxml2 bindings)
* [http://effbot.org/zone/element-index.htm ElementTree] (small subset only)
* [http://4suite.org/index.xhtml 4Suite]
* [http://uche.ogbuji.net/tech/4suite/amara/ Amara]
* Sedna XML Database; Ruby
* [http://www.germane-software.com/software/rexml REXML]
* [http://code.whytheluckystiff.net/hpricot Hpricot] (implements a subset)
* [http://libxml.rubyforge.org/ libxml2] ;ActionScript
* [http://www.xfactorstudio.com/ XPath4AS] ; Scheme
* [http://okmij.org/ftp/Scheme/xml.html#SXPath SXPath]
* Sedna XML Database;SQL
* [http://dev.mysql.com/tech-resources/articles/mysql-5.1-xml.html MySQL] supports a subset of Xpath from version 5.1.5 onwardsUse of XPath in Schema Languages
XPath is increasingly used to express constraints in schema languages for XML.
*The (now ISO standard) schema languageSchematron pioneered the approach.
*A streaming subset of XPath is used in W3C XML Schema for expressing uniqueness and key constraints.
*XForms uses XPath to bind types to values.
*The approach has even found use in non-XML applications, such as the constraint language for Java called PMD: the Java is converted to a DOM-like parse tree, then XPaths rules are defined over the tree.ee also
*
XML
*XSL,XSLT ,XSL-FO
*XQuery
*XLink ,XPointer
*XML Schema
*Schematron
*STXPath
*Navigational Database
*XML DatabaseExternal links
* [http://www.w3.org/TR/xpath XPath 1.0 specification]
* [http://www.di.unipi.it/~ghelli/didattica/SSD/xpath/xpath-leashed.pdf A survey on theoretical aspects of XPath (
Michael Benedikt andChristoph Koch : "XPath Leashed!", To Appear in ACM Computing Surveys, March 2009.)]
* [http://www.data2type.de/xml-xslt-xslfo/xpath-einfuehrung XPath-Tutorial (german)]
Wikimedia Foundation. 2010.