- Data transformation
-
Data transformation/Source transformation Concepts metadata · data mapping
data transformation · model transf.Languages ATL · AWK · MOFM2T · QVT · TXL
XML languagesTechniques and transforms identity · synthesis · refinement Applications data migration · data conversion
ETL · program transformationApplication fields Data warehouse
Software engineering
Software languages: macro, preprocessing, templatev · data transformation (statistics). In metadata and data warehouse, a data transformation converts data from a source data format into destination data.
Data transformation can be divided into two steps:
- data mapping maps data elements from the source to the destination and captures any transformation that must occur
- code generation that creates the actual transformation program
Data element to data element mapping is frequently complicated by complex transformations that require one-to-many and many-to-one transformation rules.
The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.
When the mapping is indirect via a mediating data model, the process is also called data mediation.
Contents
Transformational languages
There are numerous languages available for performing data transformation. Many transformation languages require a grammar to be provided. In many cases the grammar is structured using something closely resembling Backus–Naur Form (BNF). There are numerous languages available for such purposes varying in their accessibility (cost) and general usefulness. Examples of such languages include:
- AWK - one of the oldest and popular TXT data transform language;
- Perl - a high-level language with both procedural and object-oriented syntax capable of powerful operations on binary or text data.
- Template languages - specialized for transform data into documents (see also template processor);
- TXL - prototyping language-based descriptions, used for source code or data transformation.
- XSLT - the standard XML data transformation language (suitable by XQuery in many applications);
Although transformational languages are typically best suited for transformation, something as simple as regular expressions can be used to achieve useful transformation. A text editor like emacs or Textpad supports the use of regular expressions with arguments. This would allow all instances of a particular pattern to be replaced with another pattern using parts of the original pattern. For example:
foo ("some string", 42, gCommon); bar (someObj, anotherObj); foo ("another string", 24, gCommon); bar (myObj, myOtherObj);
could both be transformed into a more compact form like:
foobar("some string", 42, someObj, anotherObj); foobar("another string", 24, myObj, myOtherObj);
In other words, all instances of a function invocation of foo with three arguments, followed by a function invocation with two invocations would be replaced with a single function invocation using some or all of the original set of arguments.
Another advantage to using regular expressions is that they will not fail the null transform test. That is, using your transformational language of choice, run a sample program through a transformation that doesn't perform any transformations. Many transformational languages will fail this test.
Transforming source code
Program synthesis, Automatic programming and other fields use the data transformation strategies for translate, adapt or even generate software source code. Inversely these source transformation tools can be used for data transform, typically for transform "document source code" as HTML or another XML dialect (see also Template processors).
For further information on (software) source transformation see[1](Chapter 2.4) or[2].
Generally the different types of transformations fall into one of two categories[3],
- Translation: a transformation from a language X into another language Y.
- Rephrasing: a rephrasing involves a transformation within the same language but merely stated a different way.
Example
A difficult problem to address in C++ is "unstructured preprocessor directives". These are preprocessor directives which do not contain blocks of code with simple grammatical descriptions, like in this function definition:
void MyFunc () { if (x>17) { printf("test"); # ifdef FOO } else { # endif if (gWatch) mTest = 42; } }
A really general solution to handling this is very hard because such preprocessor directives can essentially edit the underlying language in arbitrary ways. However, because such directives are not, in practice, used in completely arbitrary ways, one can build practical tools for handling preprocessed languages. The DMS Software Reengineering Toolkit is capable of handling structured macros and preprocessor conditionals. Brabrand and Schwartzbach (2000)[4] offer another approach, substituting the C preprocessor by a metamorphic one.
See also
Concepts: Languages and typical transforms: Other: - File Formats, Transformation, and Migration (related wikiversity article)
References
- ^ T. Cassidy (2004) "Concurrency Analysis of Java RMI Using Source Transformation and Verisoft", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.5755&rep=rep1&type=pdf
- ^ J. R. Cordy (2006) "The TXL source transformation language". DOI 10.1016/j.scico.2006.04.002
- ^ Eelco Visser (2001), "A Survey of Strategies in Program Transformation Systems". Electronic Notes in Theoretical Computer Science, 57:363-377.
- ^ Claus Brabrand and Michael I. Schwartzbach (2000) "Growing Languages with Metamorphic Syntax Macros". BRICS Report Series RS-00-24. BRICS, Denmark. ISSN 0909-0878.
External links
v · d · eData warehouse Creating the data warehouse Concepts- Database
- Dimension
- Dimensional modeling
- Fact
- OLAP
- Star schema
- Aggregate
Variants- Anchor Modeling
- Column-oriented DBMS
- Data Vault Modeling
- HOLAP
- MOLAP
- ROLAP
- Operational data store
Elements- Data dictionary/Metadata
- Data mart
- Sixth normal form
- Surrogate key
FactDimensionFilling- Extract-Transform-Load (ETL)
- Extract
- Transform
- Load
Using the data warehouse ConceptsLanguagesTools- Business intelligence tools
- Reporting software
- Spreadsheet
Related PeopleProducts- Comparison of OLAP Servers
- Data warehousing products and their producers
Categories:
Wikimedia Foundation. 2010.
Look at other dictionaries:
Data Transformation Services — Data Transformation Services, or DTS, is a set of objects and utilities to allow the automation of extract, transform and load operations to or from a database. The objects are DTS packages and their components, and the utilities are called DTS… … Wikipedia
Data Transformation Services — Data Transformation Services, oder DTS, ist eine Sammlung von Paketen, Komponenten und Hilfsprogrammen, die es erlaubt, Extract, Transform, Load Prozesse beim Import in oder Export aus einer Datenbank zu automatisieren. DTS sind in den Microsoft… … Deutsch Wikipedia
Data transformation (statistics) — A scatterplot in which the areas of the sovereign states and dependent territories in the world are plotted on the vertical axis against their populations on the horizontal axis. The upper plot uses raw data. In the lower plot, both the area and… … Wikipedia
Data Transformation Services — Les DTS sont des services inclus dans le SGBD SQL Server 2000 qui facilitent l’extraction, la transformation et le chargement de données hétérogènes à l’aide de OLE DB, Open Database Connectivity (ODBC) ou des fichiers texte seulement dans… … Wikipédia en Français
Data mapping — Data transformation/Source transformation Concepts metadata · data mapping data transformation · model transf … Wikipedia
Transformation — (root transform ) may refer to:Transformation is also referred to as a turn.In science: * Transformation (geometry), in mathematics, as a general term applies to mathematical functions. ** Data transformation (statistics) in statistics. *… … Wikipedia
Data migration — is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when… … Wikipedia
Data-centric programming language — defines a category of programming languages where the primary function is the management and manipulation of data. A data centric programming language includes built in processing primitives for accessing data stored in sets, tables, lists, and… … Wikipedia
Data Moving Tool — (DMT) is a Windows based ETL that Extracts, Transforms and Loads data to and from any data source. DMT was created by S.E.R. Software Solutions, a small California software company. DMT is designed to handle small to medium data volumes. Contents … Wikipedia
Data Intensive Computing — is a class of parallel computing applications which use a data parallel approach to processing large volumes of data typically terabytes or petabytes in size and typically referred to as Big Data. Computing applications which devote most of their … Wikipedia
18+© Academic, 2000-2025- Contact us: Technical Support, Advertising
Dictionaries export, created on PHP, Joomla, Drupal, WordPress, MODx.Share the article and excerpts
Data transformation
- Data transformation
-
Data transformation/Source transformation Concepts metadata · data mapping
data transformation · model transf.Languages ATL · AWK · MOFM2T · QVT · TXL
XML languagesTechniques and transforms identity · synthesis · refinement Applications data migration · data conversion
ETL · program transformationApplication fields Data warehouse
Software engineering
Software languages: macro, preprocessing, template