Round-trip format conversion

Round-trip format conversion

The term round-trip is commonly used in document conversion particularly involving markup languages such as XML and SGML. A successful round-trip consists of converting a document in format A (docA) to one in format B (docB) and then back again to format A (docA′). If docA and docA′ are identical then there has been no information loss and the round-trip has been successful. More generally it means converting from any data representation and back again, including from one data structure to another.

Information loss

When a document in one format is converted to another there is likely to be information loss. For example, suppose an HTML document is saved as plain text (*.txt). Then all the markup (structure, formatting, superscripts, …) will be lost. Compound documents will frequently lose information on images and other embedded objects. If the text file is converted back to the original format, information will necessarily be missing.

A similar effect happens with image formats. Some formats such as JPEG achieve compression through small amount of information loss. If a lossless file, such as a BMP or PNG file, is converted to JPEG and back again then the result will be different from the original (although it may be visually very similar).

Just because the initial and final documents are not bitwise identical does not mean there is information loss. Some formats have undefined fields, or fields where the contents have no impact on the result.

Markup languages

Markup languages such as XML can, in principle, hold any information and so the process docA → docX → docA' could be designed to avoid information loss. It is now common to convert legacy formats to XML formats because they have greater interoperability and a wider set of available tools. Thus it is possible to convert Word documents to an XML format and reimport them.

The XML document should contain identical information to the legacy format. An important condition is that the roundtrip (legacy → XML → legacy') should result in effectively identical documents. Because some document structures allow some flexibility in content order, whitespace, case-sensitivity, etc. it is useful to have a means of canonicalizing the legacy format. The full roundtrip may then be:

:legacy → canonicalLegacy → XML → legacy′ → canonicalLegacy′

If canonicalLegacy = canonicalLegacy′ then the roundtrip has been successful.

Limitation

An application can claim to round-trip and be dishonest. For example, it may save the original data from docA as a field in docX, so the reverse transformation to docA′ simply extracts that field. While this may be needed for some cases, the idea of a round-trip conversion is to go through another format representation or data structure and back again.

Usage

The term appears to be common, but not reported in dictionaries. A typical usage occurs in [http://mailman.ic.ac.uk/pipermail/xml-dev/1999-March/010781.html] but the term is likely to have been used before this.

See also

* Lossy data conversion


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Round trip — has several uses including:*Round trip in travel means purchasing a ticket or traveling from one destination to another and then returning to the starting location. *Round tripping *Round trip delay time in communications *Round trip format… …   Wikipedia

  • Lossy data conversion — A lossy data conversion method is one where converting data between one storage format and another displays data in a form that is close enough to be useful, but may differ in some ways from the original. This type of conversion is used… …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • football — /foot bawl /, n. 1. a game in which two opposing teams of 11 players each defend goals at opposite ends of a field having goal posts at each end, with points being scored chiefly by carrying the ball across the opponent s goal line and by place… …   Universalium

  • Business and Industry Review — ▪ 1999 Introduction Overview        Annual Average Rates of Growth of Manufacturing Output, 1980 97, Table Pattern of Output, 1994 97, Table Index Numbers of Production, Employment, and Productivity in Manufacturing Industries, Table (For Annual… …   Universalium

  • baseball — /bays bawl /, n. 1. a game of ball between two nine player teams played usually for nine innings on a field that has as a focal point a diamond shaped infield with a home plate and three other bases, 90 ft. (27 m) apart, forming a circuit that… …   Universalium

  • Diffusion of technology in Canada — This article outlines the history of the diffusion or spread of technology in Canada. Technologies chosen for treatment here include, in rough order, transportation, communication, energy, materials, industry, public works, public services… …   Wikipedia

  • turn — Synonyms and related words: Charybdis, Platonic form, Platonic idea, S curve, a thing for, aberrancy, aberration, about ship, about face, access, acciaccatura, accommodation, accomplished fact, accomplishment, achievement, act, act of grace, act… …   Moby Thesaurus

  • SS George Washington — was an ocean liner built in 1908 for the Bremen based North German Lloyd and was named after George Washington, the first President of the United States. The ship was also known as USS George Washington (ID 3018) and USAT George Washington in… …   Wikipedia

  • San Francisco Municipal Railway — Info Owner City and County of San Francisco Locale San Francisco …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”