- VTD-XML
Infobox Software
name = VTD-XML
caption =
developer = XimpleWare
latest_release_version = 2.4
latest_release_date =April 2 ,2008
latest_preview_version =
latest_preview_date =
operating_system = Portable
platform =
blog = [http://vtd-xml.blogspot.com VTD-XML blog]
genre =XML parser/indexer/slicer/editor library
license =GPL and Proprietary License
website = [http://vtd-xml.sourceforge.net/ vtd-xml.sourceforge.net] [http://vtd-xml.blogspot.com VTD-XML blog]Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of efficient
XML processing technologies centered around a non-extractive [ [http://www.xml.com/pub/a/2004/05/19/parsing.html Non-extractive Parsing for XML] ]XML , "document-centric" parsing technique calledVirtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:*A "Document-Centric" [ [http://www.devx.com/xml/Article/36379 Manipulate XML Content the Ximple Way] Introduce the concept of document centric XML Processing and why it is where the future lies] [ [http://www.codeproject.com/KB/cs/xml_processing_future.aspx VTD-XML: XML Processing for the Future (Part II)] ] XML
parser [ [http://www.javaworld.com/javaworld/jw-03-2006/jw-0327-simplify.html Simplify XML Processing with VTD-XML] ] [ [http://www.devx.com/xml/Article/22219 Better Faster XML Processing with VTD-XML] ] [ [http://www.codeproject.com/KB/cs/vtd-xml_examples.aspx?display=PrintAll VTD-XML: XML Processing for the future (Part I)] ]
*A native XML indexer or a file format that uses binary data to enhance the text XML [ [http://xml.sys-con.com/read/453082.htm Index XML documents with VTD-XML] ]
*An incremental XML content modifier [ [http://vtd-xml.blogspot.com/2007/04/since-cut-paste-split-and-assemble-xml.html XML content update using XMLModifier (Part I)] ] [ [http://vtd-xml.blogspot.com/2007_07_01_archive.html XML content update using XMLModifier (Part II)] ]
*An XML slicer/splitter/assembler [ [http://www.javaworld.com/javaworld/jw-07-2006/jw-0724-vtdxml.html Cut, Paste, Split and Assemble XML documents with VTD-XML] ]
*An XML editor/eraser
*A way to port XML processing on chip [ [http://www.ximpleware.com/wp_SUN.pdf XML on a chip?] ] [ [http://www.xml.com/pub/a/2005/03/09/chip.html XML on a Chip] ] [ [http://www.w3.org/2003/08/binary-interchange-workshop/20-ximpleware-positionpaper-updated.htm XimpleWare's W3C binary XML workshop Position Paper ] ]
*A non-blocking, statelessXPath evaluator [ [http://www.devx.com/xml/Article/34045/0/page/1 Improve XPath efficiency with VTD-XML] ]VTD-XML is developed by
XimpleWare and dual-licensed underGPL and proprietary license. It is originally written in Java, but is now available in C [ [http://www.developer.com/java/other/article.php/3714051 VTD-XML: a New Vision of XML] by courtesy of Victor R. Volkman ] and C#.Basic Concept
Non-Extractive, Document-Centric Parsing
Traditionally, a lexical analyzer typically represents tokens (the small units of indivisible character values) as discrete string objects. This approach is known as "extractive" parsing. In contrast, "non-extractive" tokenization mandates that one keep the source text intact, and use offsets and lengths to describe those tokens.
Virtual Token Descriptor
Virtual Token Descriptor (VTD) applies the concept of non-extractive, document-centric parsing to XML processing. A VTD record uses a 64-bit integer to encode the offset, length, token type and nesting depth of a token in an XML document. Because all VTD records are 64-bit in length, they can be stored efficiently and managed as an array.Location Cache
Location Caches (LC) build on VTD records to provide efficient random access. Organized as tables, with one table per nesting depth level, LCs contain entries modeling an XML document's element hierarchy. An LC entry is a 64-bit integer encoding a pair of 32-bit values. The upper 32 bits identify the VTD record for the corresponding element. The lower 32 bits identify that element's first child in the LC at the next lower nesting level.
Benefits
Overview
Virtually all the core benefits of VTD-XML are inherent to non-extractive, document-centric parsing which provides these characteristics:
*The source XML text is kept intact in memory without decoding.
*The internal representation of VTD-XML is inherently persistent.
*Obviatesobject-oriented modeling of the hierarchical representation as it relies entirely on primitive data types (e.g., 64-bit integers) to represent the XML hierarchy, thus reducing object creation cost to nearly zero [ [http://webservices.sys-con.com/read/250512.htm i-Technology Viewpoint: The Performance Woe of Binary XML @ SOA WORLD MAGAZINE ] ] .Combining those characteristics permits thinking of XML purely as syntax (bits, bytes, offsets, lengths, fragments,
namespace-compensated fragments , and document composition) instead of theserialization /deserialization of objects. This is a powerful way to think about XML/SOA applications.As Parser
When used in parsing mode, VTD-XML is a general purpose, high performance [ [http://vtd-xml.sourceforge.net/benchmark4.html VTD-XML Parsing/Navigation Perforamnce Report] ] XML parser which compares favorably with others:
* VTD-XML typically outperforms SAX (with NULL content handler) by 100%, while still providing full random access and built-in
XPath support.
* VTD-XML typically consumes 1.3-1.5 times the XML document's size in memory, which is about 1/5 the memory usage of DOM
* Applications written in VTD-XML are usually much shorter and cleaner than their DOM or SAX versions.As Indexer
Because of the inherent persistence of VTD-XML, developers can write the internal representation of a parsed XML document to disk and later reload it to avoid repetitive parsing. To this end, XimpleWare has introduced
VTD+XML as a binary packaging format combining VTD, LC and the XML text. It can typically be viewed in one of the following two ways:*A native XML index that completely eliminates the parsing cost and also retains all benefits of XML. It is a file format that is human readable and backward compatible with XML.
*A
binary XML format that uses binary data to enhance the processing of the XML text.XML Content Modifier
Because VTD-XML keeps the XML text intact without decoding, when an application intends to modify the content of XML it only needs to modify the portions most relevant to the changes. This is in stark contrast with DOM, SAX, or StAx parsing, which incur the cost of parsing and re-serialization no matter how small the changes are.
Since VTDs refer to document elements by their offsets, changes to the length of elements occurring earlier in a document require adjustments to VTDs referring to all later elements. However, those adjustments are integer additions, albeit to many integers in multiple tables, so they are quick.
XML Slicer/Splitter/Assembler
An application based on VTD-XML can also use offsets and lengths to address tokens, or element fragments. This allow XML documents to be manipulated like arrays of bytes.
*As a slicer, VTD-XML can "slice" off a token or an element fragment from an XML document, then insert it back into another location in the same document, or into a different document.
*As a splitter, VTD-XML can split sub-elements in an XML document and dump each into a separate XML document.
*As an assembler, VTD-XML can "cut" chunks out of multiple XML documents and assemble them into a new XML document.XML Editor/Eraser
Used as an editor/eraser, VTD-XML can directly edit/erase the underlying byte content of the XML text, provided that the token length is wider than the intended new content. An immediate benefit of this approach is that the application can immediately reuse the original VTD and LC. In contrast, when using VTD-XML to incrementally update an XML document, an application needs to reparse the updated document before the application can process it.
An editor can be made smart enough to track the location of each token, permitting new, longer tokens to replace existing, shorter tokens by merely addressing the new token in separate memory outside that used to store the original document. Likewise, when reordering the document, element text need not be copied; only the LCs need be updated. When a complete, contiguous XML document is needed, such as when saving it, the disparate parts can be reassembled into a new, contiguous document.
Other Benefits
VTD-XML also pioneers the non-blocking, stateless XPath evaluation approach.
Weaknesses
VTD-XML also exhibits a few noticeable shortcomings:
*As an XML parser, it does not support external entities declared in the DTD.
*As a file format, it increases the document size by about 30% to 50%.
*As an API, it is not compatible withDOM orSAX .
*It is difficult to support certain validation techniques, employed by DTD and XML Schema (e.g., default attributes and elements), that require modifications to the XML instances being parsed.Areas of Applications
General-purpose Replacement for DOM or SAX
Because of VTD-XML's performance and memory advantages, it covers a larger portion of XML use cases than either DOM or SAX [http://www.devx.com/xml/Article/30484 Process Large XML documents with VTD-XML] .
*Compared to DOM, VTD-XML processes bigger (3x~5x) XML documents for the same amount of physical memory at about 3 to 10 times the performance.
*Compared to SAX, VTD-XML provides random access and XPath support and outperforms SAX by at least 2x.For SOA/WS/XML Security
The combination of VTD-XML's high performance and incremental-update capability makes it essential [ [http://www.javaworld.com/javaworld/jw-01-2007/jw-01-vtd.html Accelerate WSS Applications with VTD-XML] ] [ [http://www.ximpleware.com/security/ W3C workshop presentation on XML security] ] [http://www.w3.org/2007/xmlsec/ws/papers/06-zhang-ximpleware/] to achieve the desired level of
QoS for SOA/WS/XML security applications.For SOA/WS/XML Intermediary
VTD-XML is well suited for SOA intermediary applications such as XML routers/switches/gateways,
Enterprise Service Bus es, and services aggregation points. All those applications perform the basic "store and forward" operations for which retaining the original XML is critical for minimizing latency. VTD-XML's incremental update capability also contributes significantly to the forwarding performance.VTD-XML's random-access capability lends itself well to
XPath -based XML routing/switching/filtering common inAJAX and SOA deployment.Intelligent SOA/WS/XML Load-balancing and Offloading
When an XML document travels through several middle-tier SOA components, the first message stop, after finishing the inspection of the XML document, can choose to send the VTD+XML file format to the downstream components to avoid repetitive parsing, thus improving throughput.
By the same token, an intelligent SOA load balancer can choose to generate VTD+XML for incoming/outgoing SOAP messages to offload XML parsing from the application servers that receive those messages.
XML Persistence Data Store
When viewed from the perspective of native XML persistence, VTD-XML can be used as a human-readable, easy to use, general-purpose XML index. XML documents stored this way can be loaded into memory to be queried, updated, or edited without the overhead of parsing/re-serialization.
Schemaless XML Data Binding
VTD-XML's combination of high performance, low memory usage, and non-blocking XPath evaluation makes possible a new
XML data binding approach based entirely on XPath. This approach's biggest benefit is it no longer requires XML schema, avoids needless object creation, and takes advantage of XML's inherent loose encoding [ [http://www.onjava.com/pub/a/onjava/2007/09/07/schema-less-java-xml-data-binding-with-vtd-xml.html Schemaless Java-XML data binding with VTD-XML] ] .Essential Classes
As of Version 2.2, the Java and C# versions of VTD-XML consist of the following classes:
* VTDGen (VTD Generator) is the class that encapsulates the main parsing, index loading and index writing functions.*VTDNav (VTD Navigator) is the class that (1) encapsulates XML, VTD, and hierarchical info, (2) contains various navigation methods,(3) performs various comparisons between VTD records and strings, and (4) converts VTD records to primitive data types.
*AutoPilot is a class containing functions that perform node-level iteration and XPath.
*XMLModifier is a class that offers incremental update capability, such as delete, insert and update.
Code Sample
External links
Wikimedia Foundation. 2010.