DataCleaner

DataCleaner
DataCleaner
Dc 1.4 a.png
DataCleaner 1.4 screenshot
Developer(s) eobjects.org
Stable release 2.0
Written in Java
Operating system Cross-platform
Type Data profiling Data quality
License Lesser General Public License
Website http://datacleaner.eobjects.org

DataCleaner is the flag-ship application of the eobjects.org[1] open source community. DataCleaner is a data quality application suite with functionality for data profiling, transformation and reporting. The project was founded in late 2007 by Danish student Kasper Sørensen, who wrote a term paper[2] on the establishment of the process of establishing the project and the ways of Open source software development.

Contents

Supported datastores

DataCleaner supports read-access to a lot of different types of datastores:

History

0.x: A school project

From early on, DataCleaner 0.x versions was released as a part of Kasper Sørensens term paper project. The 0.x versions had a similar user concept as the later 1.x versions, but the underlying querying mechanisms was based on a single data factory pattern, where the application could only retrieve data from various datastores using a single method of retrieval (get all rows).

1.x: An independent OSS project

The 1.x versions of DataCleaner gained a lot of popularity in the field for DQ professionals. The application was partitioned into three specific data quality function areas:

Profiler

The profiler in DataCleaner enables the user to gain insight in to the content of the datastore. The profiler can calculate and present a lot of interesting metrics that will help the user become aware and understand data quality issues. Examples of suchs metrics are distribution of values, max/min/average values, patterns used in values etc.

Validator

The validator assumes a higher degree of data insight since it enables the user to create business rules for the data to honor. Rules for data can be defined in a variety of ways; through javascripts, lookup dictionaries, regular expressions and more.

Comparator

The comparator enables a user to compare two separate datastores and look for values from one datastore within another datastore and vice versa.

2.x: Acquisition by Human Inference

On the 14th of february, 2011, it was announced that the data quality vendor Human Inference had acquired eobjects.org, hired Kasper Sørensen and participated/sponsored the development of DataCleaner 2.0. The 2.0 release of DataCleaner was released the same day. It introduces a new user experience, where all of the previous function areas have been unified into a single workbench.

License history

As of version 1.5 DataCleaner changed its license from the Apache License version 2.0 to the Lesser General Public License. According to the DataCleaner website, the change was made to "ensure that improvements are submitted back to the projects" and that "we don't risk that anyone sell modified versions of our projects" [3].

References

  1. ^ eobjects.org
  2. ^ ^ Sørensen, Kasper (2008). Udvikling og styring af Open Source projekter (Danish). Cand.Merc.Dat, Copenhagen Business School, Downloadable from http://eobjects.org/resources/download/afloesningsopgave.pdf
  3. ^ # ^ eobjects.org news site, http://eobjects.org/trac/blog/change-in-preferred-license

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Record linkage — (RL) refers to the task of finding entries that refer to the same entity across different data sources (e.g., files, books, websites, databases, etc.). Record linkage is an appropriate technique when you have to join data sets that do not already …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”