Data exchange

Data exchange

Data exchange is the process of taking data structured under a source schema and actually transforming it into data structured under a target schema, so that the target data is an accurate representation of the source data[citation needed]. Data exchange is similar to the related concept of data integration except that data is actually restructured (with possible loss of content) in data exchange. There may be no way to transform an instance given all of our constraints. Conversely, there may be numerous ways to transform the instance (possibly infinitely many), in which case we must identify and justify a "best" choice of solutions.

Contents

Data exchange languages

A data exchange language is a language that is domain-independent and can be used for any kind of data. Its semantic expression capabilities and qualities are largely determined by comparison with the capabilities of natural languages. The term is also applied to any file format that can be read by more than one program, including proprietary formats such as Microsoft Office documents. However, a file format is not a real language as it lacks a grammar and vocabulary.

Practice has shown that certain types of formal languages are better suited for this task than others, since their specification is driven by a formal process instead of a particular softwares implementation needs. For example XML is a markup language that was designed to enable the creation of dialects (the definition of domain-specific sublanguages) and a popular choice now in particular on the internet. However, it does not contain domain specific dictionaries or fact types. Beneficial to a reliable data exchange is the availability of standard dictionaries-taxonomies and tools libraries such as parsers, schema validators and transformation tools.

Popular languages used for data exchange

The following is an incomplete list of popular generic languages used for data exchange in multiple domains.

Schemas Flexible Semantic verification Dictionary -Taxonomy Synonyms and homonyms Dialecting Web standard Transformations Lightweight Human readable Compatibility
XML Yes[1] Yes No No No Yes Yes Yes No No subset of SGML, HTML
Atom Yes Unknown Unknown Unknown Unknown Yes Yes Yes No No XML dialect
JSON No Unknown Unknown Unknown Unknown No Yes No Yes No subset of JavaScript
YAML No[2] Unknown Unknown Unknown Unknown No No No[2] Yes Yes[3] superset of JSON
REBOL Yes[6] Yes No Yes Yes Yes No Yes[6] Yes Yes[4]
Gellish Yes Yes Yes Yes[7] Yes Yes ISO No Yes Partial[5] SQL, RDF/XML, OWL

Nomenclature

  • Schemas - Whether the language definition is available in a computer interpretable form.
  • Flexible - Whether the language enables extension of the semantic expression capabilities without modifying the schema.
  • Semantic verification - Whether the language definition enables semantic verification of the correctness of expressions in the language.
  • Dictionary-Taxonomy - Whether the language includes a dictionary and a taxonomy (subtype-supertype hierarchy) of concepts with inheritance.
  • Synonyms and homonyms - Whether the language includes and supports the use of synonyms and homonyms in the expressions.
  • Dialecting - Whether the language definition is available in multiple natural languages or dialects.
  • Web or ISO standard - Organization that endorsed the language as a standard.
  • Transformations - Whether the language includes a translation to other standards.
  • Lightweight - Whether a lightweight version is available, in addition to a full version.
  • Human readable - Whether expressions in the language are readable by humans without training.
  • Compatibility - Which other tools are possible or required when using the language.

Notes:

  1. ^ The schema of XML contains a very limited grammar and vocabulary.
  2. ^ Available as extension.
  3. ^ in the default format, not the compact syntax.
  4. ^ the syntax is fairly simple (the language was designed to be human readable); the dialects may require domain knowledge.
  5. ^ the standardized fact types are denoted by standardized English phrases, which interpretation and use needs some training.
  6. ^ the Parse dialect is used to specify, validate, and transform dialects.
  7. ^ the English version includes a Gellish English Dictionary-Taxonomy that also includes standardized fact types (= kinds of relations).

XML for data exchange

The popularity of XML for data exchange on the World Wide Web has several reasons. First of all, it is closely related to the preexisting standards Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML), and as such a parser written to support these two languages can be easily extended to support XML as well. For example, XHTML has been defined as a format that is formal XML, but understood correctly by most (if not all) HTML parsers. This lead to quick adoption of XML support in web browsers and the toolchains used for generating web pages.

JSON for data exchange

Actually a part of the JavaScript programming language, the JSON (JavaScript Object Notation) was split out into a low-level format for structured data exchange. While it was originally not designed for data exchange at all, it was discovered to be useful. In contrast to XML above, there exist no schema definition and no support for dialecting. The key benefits of this language are the low overhead (the amount of data needed for structuring) compared to XML and the similarly wide support: every web browser that has JavaScript support can also process JSON.

YAML for data exchange

YAML is a language that was designed to be human-readable (and as such to be easy to edit with any standard text editor). It's notion often is similar to reStructuredText or a Wiki syntax, who also try to be readable both by humans and computers. YAML 1.2 also includes a shorthand notion that is compatible with JSON, and as such any JSON document is also valid YAML; this however does not hold the other way.

REBOL for data exchange

REBOL is a language that was designed to be human-readable and easy to edit using any standard text editor. To achieve that it uses a simple free-form syntax with minimal punctuation, and a rich set of datatypes. REBOL datatypes like URLs, e-mails, date and time values, tuples, strings, tags, etc. respect the common standards. REBOL is designed to not need any additional meta-language, being designed in a metacircular fashion. The metacircularity of the language is the reason why e.g. the Parse dialect used (not exclusively) for definitions and transformations of REBOL dialects is also itself a dialect of REBOL. REBOL was used as a source of inspiration by the designer of JSON.

Gellish for data exchange

Gellish English is a formalized subset of natural English, which includes a simple grammar and a large extensible English Dictionary-Taxonomy that defines the general and domain specific terminology (terms for concepts), whereas the concepts are arranged in a subtype-supertype hierarchy (a Taxonomy), which supports inheritance of knowledge and requirements. The Dictionary-Taxonomy also includes standardized fact types (also called relation types). The terms and relation types together can be used to create and interpret expressions of facts, knowledge, requirements and other information. Gellish can be used in combination with SQL, RDF/XML, OWL and various other meta-languages. The Gellish standard is being adopted as ISO 15926-11.

See also

References

  • R. Fagin, P. Kolaitis, R. Miller, and L. Popa. "Data exchange: semantics and query answering." Theoretical Computer Science, 336(1):89–124, 2005.
  • P. Kolaitis. "Schema mappings, data exchange, and metadata management." Proceedings of the twenty- fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 61–75, 2005

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Data exchange —   Occurs when two systems pass each other information. For example one system may transmit electronically a transaction to another. The receiving system would acknowledge the transaction by replying to the sending system that it had received a… …   International financial encyclopaedia

  • Data exchange language — Data exchange languages are formal languages specifically designed to support the communication of data and metadata. There are two kinds of data exchange languages: Markup languages work with embedded data structuring mark up, while data model… …   Wikipedia

  • Data Exchange Highway — Keitimosi žvejybos duomenimis sistema statusas Aprobuotas sritis žuvininkystė ir žvejyba apibrėžtis Europos Sąjungos automatizuota informacinė sistema, į kurią Europos Sąjungos valstybės narės siunčia duomenis apie sugautą žuvų kiekį ir jais… …   Lithuanian dictionary (lietuvių žodynas)

  • data exchange —    (DX)    Data that is transmitted or recorded in a format …   IT glossary of terms, acronyms and abbreviations

  • CAD data exchange — involves a number of software technologies and methods to translate data from one Computer aided design system to another . This PLM technology is required to facilitate collaborative work (CPD) between OEMs and their suppliers.The main topic is… …   Wikipedia

  • Dynamic Data Exchange — (DDE) is a technology for interprocess communication under Microsoft Windows or OS/2. Contents 1 Overview 1.1 NetDDE 2 See also 3 References …   Wikipedia

  • Dynamic Data Exchange — (DDE) механизм взаимодействия приложений в операционных системах Microsoft Windows и OS/2. Хотя этот механизм до сих пор поддерживается в последних версиях Windows, в основном он заменён на более мощные механизмы OLE, COM и Microsoft OLE… …   Википедия

  • Dynamic Data Exchange — Saltar a navegación, búsqueda Dynamic Data Exchange(DDE) es una tecnología de comunicación entre varias aplicaciones bajo Microsoft Windows y en OS/2. Aunque es apto para las últimas versiones de Windows, ha sido reemplazado por su mucho más… …   Wikipedia Español

  • Dynamic data exchange — Pour les articles homonymes, voir DDE. Définition Dynamic Data Exchange ou DDE (Échange dynamique de données) est un protocole client serveur définit par Microsoft depuis Windows 2 et OS/2 pour l échange de données entre applications. Le… …   Wikipédia en Français

  • Education Data Exchange Network — The Education Data Exchange Network (EDEN) is a set of K 12 statistical reports gathered from state agencies by the US Department of Education.Formerly known as Performance Based Data Management Initiative (PBDMI), EDEN attempts to gather… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”