- YAML
YAML (IPAEng|ˈjæməl,
rhyme s with "camel") is ahuman-readable data serialization format that takes concepts from languages such asXML , C, Python,Perl , as well as the format for electronic mail as specified by RFC [http://www.rfc-editor.org/rfc/rfc2822.txt 2822] . YAML was first proposed by Clark Evans in2001 , [Cite web |url=http://tech.groups.yahoo.com/group/sml-dev/message/4710 |title=YAML Draft 0.1 |last=Evans |first=Clark |date=May 11 ,2001 |publisher=Yahoo! Tech groups: sml-dev |accessdate=2008-08-02] who designed it together with Ingy döt Net and Oren Ben-Kiki.It is available for several programming and scripting languages."YAML" is a
recursive acronym for "YAML Ain't a Markup Language". Early in its development, "YAML "was said to mean "Yet Another Markup Language", but wasretronym ed to distinguish its purpose as data-centric, rather than document markup.Features
YAML syntax was designed to be easily mapped to data types common to most high-level languages: list, hash, and scalar.For purposes of this article, the terms (list and array), (hash, dictionary and mapping) and (string and scalar) are used interchangeably. Such usage is a simplification and may not be correct when specifically applied to some programming languages.] Its familiar indented outline and lean appearance makes it especially suited for tasks where humans are likely to view or edit data structures, such as configuration files, dumping during debugging, and document headers (e.g. the headers found on most e-mails are very close to YAML). Although well-suited for hierarchical data representation, it also has a compact syntax for a relational data as well.A hierarchical model only gives a fixed, monolithic view of the tree structure. For example, either actors under movies, or movies under actors. YAML allows both using a relational model.] Its line and whitespace delimiters make it friendly to "ad hoc"
grep /Python/Perl /Ruby operations. A major part of its accessibility comes from eschewing the use of enclosures like quotation marks, brackets, braces, and open/close-tags which can be hard for the human eye to balance in nested hierarchies.Examples
ample document
Data structure hierarchy is maintained by outline indentation.
Notice that strings do not require enclosure in quotations. The specific number of spaces in the indentation is unimportant as long as parallel elements have the same left justification and the hierarchically nested elements are indented further. That sample document defines a hash with 7 top level keys: one of the keys, "items", contains a 2 element array (or "list"), each element of which is itself a hash with four keys. Relational data and redundancy removal are displayed: the "ship-to" hash content is copied from the "bill-to" hash's content as indicated by the anchor(&) and reference(*) labels. Optional blank lines can be added for readability. Multiple documents can exist in a single file/stream and are separated by "---". An optional "..." can be used at the end of a file (useful for signalling an end in streamed communications without closing the pipe).---receipt: Oz-Ware Purchase Invoicedate: 2007-08-06customer: given: Dorothy family: Gale items: - part_no: A4786 descrip: Water Bucket (Filled) price: 1.47 quantity: 4- part_no: E1628 descrip: High Heeled "Ruby" Slippers price: 100.27 quantity: 1
bill-to: &id001 street: | 123 Tornado Alley Suite 16 city: East Westville state: KS
ship-to: *id001
specialDelivery: > Follow the Yellow Brick Road to the Emerald City. Pay no attention to the man behind the curtain....
Language elements
Basic components of YAML
YAML offers both an indented and an "in-line" style for denoting hashes and lists. Here is a sampler of the components.
Lists
Conventional block format uses a dash to begin a new item in list --- # Favorite movies - Casablanca - North by Northwest - Notorious
Optional inline format is delimited by comma+space and enclosed in brackets (similar to
JSON ) --- # Shopping list [milk, pumpkin pie, eggs, juice]Hashes
Keys are separated from values by a colon-space. --- # Block name: John Smith age: 33 --- # Inline {name: John Smith, age: 33}
Block literals
Strings do not require quotation.
Newlines preserved
---
There once was a man from Darjeeling Who got on a bus bound for Ealing It said on the door "Please don't spit on the floor" So he carefully spat on the ceilingBy default, trailing white space is stripped. Use |+ to keep trailing whitespace. Leading whitespace is trimmed to first line's indent. Use |8 to add a leading whitespace indent (where 8 is any number).
Newlines folded
--- > Wrapped text will be folded into a single paragraph Blank lines denote paragraph breaksFolded text converts newlines to spaces. This behavior cannot be overridden. Use >- to strip white space at the end of the "paragraph" only, where a YAML "paragraph" ends after the last non-empty line.
Hierarchical combinations of elements
Lists of hashes
- {name: John Smith, age: 33} - name: Mary Smith age: 27
Hashes of lists
men: [John Smith, Bill Jones] women: - Mary Smith - Susan Williams
Advanced components of YAML
As discussed in a subsequent section, two features that distinguish YAML from the capabilities of other data serialization languages are Relational trees and Data Typing.
Relational trees
Data merge and references
For clarity, compactness, and avoiding data entry errors, YAML provides node references (*) and hash merges (<<) that refer to a node labeled with an anchor (&) tag. References branch the tree to the anchor and work for all data types (see the ship-to reference in the example above). Merges are for hashes only, and merge the keys at the anchor into the referring hashmap.
Merges and references are automatically expanded by the parser when the data structure is instantiated. This can greatly enhance readability and facilitate editing: below is an example of a queue in an instrument sequencer in which each subsequent step only lists the elements that are changed from the first step. When a YAML parser loads this array, all the "step" hashes will have the 5 keys specified in first step.
# sequencer protocols for Laser eye surgery---- step: &id001 # defines anchor label &id001 instrument: Lasik 2000 pulseEnergy: 5.4 pulseDuration: 12 repetition: 1000 spotSize: 1mm- step: <<: *id001 # merges key:value pairs defined in step1 anchor spotSize: 2mm # overrides "spotSize" key's value
- step: <<: *id001 # merges key:value pairs defined in step1 anchor pulseEnergy: 500.0 # overrides key alert: > # adds additional key warn patient of audible pop
Data types
Explicit data typing is seldom seen in the majority of YAML documents since YAML autodetects simple types. Data types can be divided into three categories: core, defined, and user-defined. Core are ones expected to exist in any parser (e.g floats, ints, strings, lists, maps, ...). Many more advanced data types, such as binary data, are defined in the YAML specification but not supported in all implementations. Finally YAML defines a way to extend the data type definitions locally to accommodate user defined classes, structures or primitives (e.g. quad precision floats).
Casting data types
YAML autodetects the datatype of the entity. Sometimes one wants to cast the datatype explicitly. The most common situation is a single word string that looks like a number, boolean or tag may need disambiguation by surrounding it with quotes or use of an explicit datatype tag.
---a: 123 # an integerb: "123" # a string, disambiguated by quotesc: 123.0 # a floatd: !!float 123 # also a float via explicit data type prefixed by (!!)e: !!str 123 # a string, disambiguated by explicit typef: !!str Yes # a string via explicit typeg: Yes # a boolean Trueh: Yes we have No bananas # a string, "Yes" and "No" disambiguated by context.Other specified data types
Not every implementation of YAML has every specification-defined data type. These built-in types use a double exclamation sigil prefix(!!). Particularly interesting ones not shown here are sets, ordered maps, timestamps, and hexadecimal. Here's an example of binary data.
---picture: !!binary
R0lGODlhDAAMAIQAAP//9/X 17unp5WZmZgAAAOfn515eXv Pz7Y6OjuDg4J+fn5OTk6enp 56enmleECcgggoBADs=mZmEExtension for user-defined data types
Many implementations of YAML can support user defined data types. This is a good way to serialize an object. Local data types are not universal data types but are defined in the application using the YAML parser library. Local data types use a single exclamation mark(!).
---myObject: !myClass { name: Joe, age: 15}yntax
A compact [http://yaml.org/refcard.html cheat-sheet] (actually written in YAML) as well as a [http://yaml.org/spec/ full specification] are available at [http://yaml.org yaml.org] . The following is a synopsis of the basic elements.
* YAML streams are encoded using the set of printableUnicode characters, either inUTF-8 orUTF-16
* Whitespaceindentation is used to denote structure; howevertab characters are never allowed as indentation
* Comments begin with thenumber sign ( # ), can start anywhere on a line, and continue until the end of the line
* List members are denoted by a leadinghyphen ( - ) with one member per line, or enclosed insquare brackets ( [ ] ) and separated by comma space ( , ).
* Hashes are represented using the colon space ( : ) in the form "key: value", either one per line or enclosed incurly braces ( { } ) and separated by comma space ( , ).
** A hash key may prefixed with aquestion mark ( ? ) to allow for liberal multi-word keys to be represented unambiguously.
* Strings (scalars) are ordinarily unquoted, but may be enclosed in double-quotes ( " ), or single-quotes ( ' ).
** Within double-quotes, special characters may be represented with C-style escape sequences starting with abackslash ( ).
* Block scalars are delimited withindentation with optional modifiers to preserve ( | ) or fold ( > ) newlines
* Multiple documents within a single stream are separated by threehyphens ( --- )
**three periods ( ... ) optionally end a file within a stream
* Repeated nodes are initially denoted by anampersand ( & ) and thereafter referenced with anasterisk ( * )
* Nodes may be labeled with a type or tag using theexclamation point ( !! ) followed by a string which can be expanded into a URI.
* YAML documents in a stream may be preceded by directives composed of apercent sign ( % ) followed by a name and space delimited parameters. Two directives are defined in YAML 1.1:
** The %YAML directive is used to identify the version of yaml in a given document.
** The %TAG directive is used as a shortcut for URI prefixes. These shortcuts may then be used in node type tags.YAML requires that colons and commas used as list separators be followed by a space so that scalar values containing embedded punctuation (such as 5,280 or http://www.wikipedia.org) can generally be represented without needing to be enclosed in quotes.
Two additional sigil characters are reserved in YAML for possible future standardisation: the
at sign ( @ ) andaccent grave ( ` ).Comparison to other data structure format languages
While YAML shares similarities with
JSON ,XML and SDL, it also has characteristics that are unique in comparison to many other similar format languages.JSON
JSON syntax is "nearly" [The syntax differences are subtle and seldom arise in practice: JSON allows extended charactersets like UTF-32, YAML requires a space after separators like comma, equals, and colon while JSON does not, and some non-standard implementations of JSON extend the grammar to include Javascript's /*...*/ comments. Handling such edge cases may require light pre-processing of the JSON before parsing as in-line YAML ] a subset of YAML and most JSON documents can be parsed by a YAML parser. [ [http://redhanded.hobix.com/inspect/yamlIsJson.html Parsing JSON with SYCK] ] This is because JSON's semantic structure is equivalent to the optional "inline-style" of writing YAML. While extended hierarchies can be written in inline-style like JSON, this is not a recommended YAML style except when it aids clarity. YAML has additional features lacking in JSON such as extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order.
XML and SDL
YAML lacks the notion of tag attributes that are found in XML and SDL. "For data structure serialization", tag attributes are, arguably, a feature of questionable utility since the separation of data and meta-data adds complexity when represented by the natural data structures (hashes, arrays) in common languages. [In Markup Languages, attribute values in an open-tag must be handled separately from the data value enclosed by the tags. Typically, to hold this in a data structure means each node is an object with storage for the tag-name plus a hash for any possible named attributes and their values, and then a separate scalar for holding any enclosed data. YAML treats these even-handedly: each node is simple type, usually a scalar, array, or hash.] Instead YAML has extensible type declarations (including class types for objects). YAML itself does not have XML's language-defined document schema descriptors that allow, for example, a document to self validate. However, a YAML schema descriptor language [http://www.kuwata-lab.com/kwalify/ exists] , and
YAXML , which represents YAML data structures in XML, allows XML schema importers and output mechanisms likeXSLT to be applied to YAML. Moreover, in typical use, the semantics provided by rich language-defined type-declarations in the YAML document itself eliminates the need for an additional validator.Indented delimiting
Because YAML primarily relies on outline indentation for structure, it is especially resistant to delimiter collision. YAML's insensitivity to quotes and braces in scalar values means one may embed XML, SDL, JSON or even YAML documents inside a YAML document by simply indenting it in a block literal:
Conversely, to place YAML in XML or SDL content requires converting all whitespace and potential sigils (like <,> and &) to entity syntax. To place YAML in JSON requires quoting it, and escaping all interior quotes.---example: > HTML goes into YAML without modificationmessage:
"Three is always greater than two, even for large values of two"
--Author Unknown
date: 2007-06-01Non-hierarchical data models
Unlike SDL, and JSON, which can only represent data in a hierarchical model with each child node having a single parent, YAML also offers a simple relational scheme that allows repeats of identical data to be referenced from two or more points in the tree rather than entered redundantly at those points. This is similar to the facility IDREF built into XML. [http://www.w3.org/TR/2000/REC-xml-20001006#idref XML IDREF] ] The YAML parser then expands these references into the fully populated data structures they imply when read in, so whatever program is using the parser does not have to be aware of a relational encoding model, unlike XML processors which do not expand references. This expansion can enhance readability while reducing data entry errors in configuration files or processing protocols where many parameters remain the same in a sequential series of records while only a few vary. An example being that "ship-to" and "bill-to" records in an invoice are nearly always the same data.
Practical considerations
YAML is line oriented and thus it is often simple to convert the unstructured output of existing programs into YAML format while having them retain much of the look of the original document. Because there are no close-tags, braces and quotation marks to balance it is generally easy to generate well-formed YAML directly from distributed print statements within unsophisticated programs. Likewise, the white space delimiters facilitate quick-and-dirty filtering of YAML files using the line oriented commands in grep, awk, perl, ruby, and python.
In particular, unlike mark-up languages, chunks of consecutive YAML lines tend to be well-formed YAML documents themselves. This makes it very easy to write parsers that do not have to process a document in its entirety (e.g. balancing open- and close-tags and navigating quoted and escaped characters) before they begin extracting specific records within. This property is particularly expedient when iterating in a single, stateless pass, over records in a file whose entire data structure is too large to hold in memory, or for which reconstituting the entire structure to extract one item would be prohibitively expensive.
Counterintuitively, although its indented delimiting might seem to complicate deeply nested hierarchies, YAML handles indents as small as a single space, and this may achieve better compression than markup languages. Additionally, extremely deep indentation can be avoided entirely by either: 1) reverting to "inline-style" (i.e JSON-like format) without the indentation; or 2) using relational anchors to unwind the hierarchy to a flat form that the YAML parser will transparently reconstitute into the full data structure.
ecurity
YAML is purely a data representation language and thus has no executable commands. [A proposed "yield" tag will allow for simple arithmetic calculation] This means that parsers will be (or at least should be) safe to apply to tainted data without fear of a latent command-injection security hole. For example, because JSON is native JavaScript it's tempting to use the JavaScript interpreter itself to evaluate the data structure into existence, leading to command injection holes when inadequately verified. While safe parsing is inherently possible in any data language, implementation is such a notorious pitfall that YAML's lack of an associated command language may be a relative security benefit.
Data processing and representation
The XMLcite web
title = Extensible Markup Language (XML) 1.0 (Fourth Edition)
url = http://www.w3.org/TR/REC-xml/
accessdate = 2007-11-04] cite web
title = Extensible Markup Language (XML) 1.1 (Second Edition)
url = http://www.w3.org/TR/xml11/
accessdate = 2007-11-04] and YAML specificationscite web
title = YAML Ain't Markup Language (YAML) Version 1.1
url = http://yaml.org/spec/current.html
accessdate = 2007-11-04] provide very different "logical" models for data node representation, processing, and storage.XML: The primary logical structures in an XML "instance document" are: 1) Element; and 2) Element attribute. [ [http://www.w3.org/TR/xml11/#sec-logical-struct Extensible Markup Language (XML) 1.1 (Second Edition) ] ] For these primary logical structures, the base XML specification does not define constraints regarding such factors as duplication of elements or the order in which they are allowed to appear. [Note, however, that the XML specification does define an "Element Content Model" for XML instance documents that include validity constraints. Validity constraints are user-defined and not mandatory for a well-formed XML instance document. http://www.w3.org/TR/xml11/#sec-element-content. In the case of duplicate Element attribute declarations, the first declaration is binding and later declarations are ignored [http://www.w3.org/TR/REC-xml/#attdecls] .] In defining conformance for XML processors, the XML specification generalizes them into two types: 1) "validating" ; and 2) "non-validating". [ [http://www.w3.org/TR/REC-xml/#sec-conformance Extensible Markup Language (XML) 1.0 (Fourth Edition) ] ] The XML specification asserts no detailed definitions for: an API; processing model; or data representation model; although several are defined in separate specifications that a user or specification implementor may choose independently. These include the
Document Object Model andXQuery .A richer model for defining valid XML content is the W3C XML Schema standard [http://www.w3.org/XML/Schema>] . This allows for full specification of valid XML content and is supported by a wide range of open source, free and commercial processors and libraries.
YAML: The primary logical structures in a YAML "instance document" [The YAML specification identifies an instance document as a "Presentation" or "character stream". [http://yaml.org/spec/current.html#id2506012] ] are: 1) Scalar; 2) Sequence; and 3) Mapping.Additional, optional-use, logical structures are enumerated in the YAML types repository.cite web
title = Language-Independent Types for YAML Version 1.1
url = http://yaml.org/type/index.html
accessdate = 2007-11-04The tagged types in the YAML types repository are optional and therefore not essential for conformant YAML processors. "The use of these tags is not mandatory."] The YAML specification also indicates some basic constraints that apply to these primary logical structures. For example, according to the specification, mapping keys do not have an order. In every case where node order is significant, a sequence must be used. [ [http://yaml.org/spec/current.html#id2508372 YAML Ain't Markup Language (YAML) Version 1.1 ] ]Moreover, in defining conformance for YAML processors, the YAML specification defines two primary operations: 1) Dump; and 2) Load. All YAML-compliant processors must provide "at least" one of these operations, and may optionally provide both. ["Dump" and "Load" operations consist of a few sub-operations, not of all of which need to be exposed to the user or through an API, (see http://yaml.org/spec/current.html#id2504671).] Finally, the YAML specification defines an "information model" or "representation graph" which must be created during processing for both "Dump" and "Load" operations, although this representation need not be made available to the user through an API. [ [http://yaml.org/spec/current.html#representation YAML Ain't Markup Language (YAML) Version 1.1 ] ]
Implementations
Portability
Simple YAML files (e.g. key value pairs) are readily parsed with regular expressions without resort to a formal YAML parser. YAML emitters and parsers for many popular languages written in the pure native language itself exist, making it portable in a self-contained manner. Bindings to C-libraries also exist when speed is needed.
C libraries
* [http://pyyaml.org/wiki/LibYAML libYAML] As of 2007-06, this implementation of YAML 1.1 is stable and recommended by the YAML specification authors [YAML creator Clark Evans inserted this recommendation.] for production use (despite the 0.1.1 version number and a mild caution that the API is not barred from evolution.).
* [http://whytheluckystiff.net/syck/ SYCK] This implementation supports most of YAML 1.0 specification and is in widespread use. It is optimized for use with higher level interpreted languages, obtaining speed by writing directly to the symbol table of the higher level language when it can. As of 2005 it is no longer maintained but remains available.Bindings
Bindings for YAML exist for the following languages:
*Perl
** [http://search.cpan.org/dist/YAML YAML::] is a common interface to several YAML parsers.
** [http://search.cpan.org/dist/YAML-Tiny/ YAML::Tiny] implements a useful subset of YAML; small, pure Perl, and faster than the full implementation.
** [http://search.cpan.org/dist/YAML-Syck/ YAML::Syck] Binding to SYCK C-library. Offers fast, highly featured YAML
** [http://search.cpan.org/dist/YAML-LibYAML/ YAML::XS] Binding to LibYaml. Better yaml 1.1 compatibility.
*PHP
** [http://spyc.sourceforge.net/ Spyc] is a pure PHP implementation
** [http://pecl.php.net/package/syck PHP-Syck] (binding to SYCK library)
** [http://trac.symfony-project.org/browser/branches/1.1/lib/yaml/ sfYaml] is a rewrite of Spyc for the symfony project, which can be used as a standalone YAML parser and emitter
*Python
** [http://pyyaml.org/ PyYaml] Highly featured. Pure Python or optionally uses LibYAML.
** [http://pyyaml.org/wiki/PySyck PySyck] Binding to SYCK C-Library
*Ruby (YAML included in standard library since 1.8. based on SYCK)
** [http://rubyforge.org/projects/ya2yaml/ Ya2YAML] with fullUTF-8 support
*Java
** [https://jvyaml.dev.java.net/ jvyaml] based on Syck, and patterned off ruby-yaml
** [http://jyaml.sourceforge.net/ JYaml] pure Java implementation
*R
** [http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/YamlR CRAN YAML] based on SYCK
*JavaScript
**Native JavaScript emits but does not read YAML
** [http://sourceforge.net/projects/yaml-javascript YAML JavaScript] emitter and parser
*.NET Framework
** [http://yaml-net-parser.sourceforge.net/ project page]
*OCaml
** [http://sourceforge.net/projects/ocaml-syck/ OCaml-Syck]
*C++
** [http://git.snoyman.com/cppweb.git?a=blob;f=src/cppmodels/yaml.hpp;h=e67377c792309a51eb5a4c9dac05ba89befd38d6;hb=HEAD C++ wrapper for libYaml]
*Objective-C
** [http://code.whytheluckystiff.net/syck/browser/trunk/ext/cocoa/src Cocoa-Syck]
*Lua
** [http://code.whytheluckystiff.net/syck/browser/trunk/ext/lua Lua-Syck]
*Haskell
** [http://ben-kiki.org/oren/YamlReference/ Haskell Reference wrappers]
*XML [http://yaml.org/xml.html YAXML] (currently draft only)Pitfalls and implementation defects
*Editors:
**An editor mode that autoexpands tabs to spaces and displays text in a fixed-width font is recommended.
**The editor needs to handle UTF-8 and UTF-16 correctly (otherwise, it will be necessary to use onlyASCII as a subset of UTF-8).
*Strings:
**YAML allows one to avoid quoted strings which can enhance readability and avoid the need for nested escape sequences. However, this leads to a pitfall when inline strings are ambiguous single words (e.g. digits or boolean words) or when the un-quoted phrase accidentally contains a YAML construct (e.g., a leading exclamation point or a colon-space after a word: "!Caca de vaca!" or "Caution: lions ahead"). This is not an issue that anyone using a proper YAML emitter will confront, but can come up in "ad hoc" scripts or human editing of files. In such a case a better approach is to use block literals ("|" or ">") rather than inline string expressions as these have no such ambiguities to resolve.
*Anticipating implementation idiosyncrasies:
**Some implementations of YAML, such as perl's YAML::BASE will load an entire file (stream) and parse it en-mass. Conversely, YAML::Tiny only reads the first document in the stream and stops. Other implementations like pyYaml are lazy and iterate over the next document only upon request. For very large files in which one plans to handle the documents independently, instantiating the entire file before processing may be prohibitive. Thus in YAML::BASE, occasionally one must chunk a file into documents and parse those individually. Fortunately, YAML makes this easy since this simply requires splitting on the document separator, m/^---/. That strategy could be disrupted if anchor and reference tags happen to lie in different documents of the same file.ee also
Other simplified markup languages include:
*JSON , which is almost a subset of YAML
*Simple Outline XML
*OGDL
*S-expression s
*Plist , the object serialization format fromNEXTSTEP .
*SDLNotes and references
External links
* [http://yaml.org YAML.org]
* [http://yaml.org/spec/ YAML Specification]
* [http://yaml4r.sourceforge.net/cookbook/ YAML Cookbook--Equivalent data structures in YAML and Ruby]
* [http://yaml.kwiki.org/?YamlInFiveMinutes YAML in Five Minutes]
* [http://www.ibm.com/developerworks/library/x-matters23.html YAML improves on XML] Intro to YAML in Python
* [http://redhanded.hobix.com/inspect/yamlIsJson.html YAML as a superset of JSON]
* [http://www.kuwata-lab.com/kwalify/ Kwalify Schema definition for YAML]
* [http://list.alwayspages.com/info/listsin5mins Lists in 5 minutes]
Wikimedia Foundation. 2010.