Open Source Data Integration

Open Source Data Integration

The Open Source Data Integration framework from the [http://snaplogic.org SnapLogic] project [cite web|url=http://www.snaplogic.org|title= Open Source Data Integration Framework] is an open source framework for enterprise scale data integration. The framework is used to design and run data integration processes in pipelines on an open source server, using components built with the framework's open source graphical designer or coded using Python scripts.

Use cases

The open source data integration framework addresses the need for repurposing different sources of data within an enterprise. Rather than hand-coding scripts using ad hoc tools and languages, the framework provides developers with a coordinated method of accessing, assimilating and publishing data. It encourages reuse of components and processes without unnecessarily burdening developers. It uses a RESTful architecture and standards-based interfaces for optimum continuity. Data integration use cases include data migration, data refactoring, enterprise mashups, business analytics and web services. The framework is designed for all levels of technical and business users, for a manner of open source office data.

Pipeline metaphor

Pipelines are created in the framework's designer tool using resources built from standard open source and custom components. Pipelines transform data from one or more graphically expressed resources. The resources are instances of components and are easily customized in a forms-based GUI. The Python scripting language allows developers to create additional components and run pipelines in a command line mode.

Pipelines are executed in a lightweight server process running directly alongside a database, on an application or file server, or on a standalone integration server across a network. The server portion of the framework is open source and instances can be distributed or centralized. In the distributed case, pipelines can be run on the same server as the data resides. All implementations provide data administrators the control they desire, while securely providing business users with the data they need through URIs. In the distributed case, one or more dedicated servers can securely access resources anywhere on the network.

Key features

The open source data integration framework provides:

* A browser-based drag-and-drop interface for combining data integration connectors and transformers with pipelines.
* A full programmatic interface in Python for connectors and customizing pipelines.
* A command line interface to allow pipelines to run from shell commands, cron jobs, and other interfaces.
* A metadata repository for storing connectors, transformers and pipelines for re-use.
* A collection of free connectors, transformers and pipelines and examples for common data sources like QuickBooks, SalesForce, Oracle and Apache.
* A collection of free extensions, such as the PHP extension package.
* An open source server to run connectors, transformers and pipelines.

Additional features

The framework contains a collection of standard interface and transformers.

Connectors:

* database readers / writers
* RSS readers / generator
* XML reader
* JSON writer
* CSV reader / writer
* fixed width file reader / writer

Transformers:

* join
* sort
* filter
* lookup
* aggregate
* merge
* mixer
* sequence generator
* type converter
* date dimension
* user defined computations

Implementation

The browser-based, drag-and-drop designer interface requires a Web browser running the Adobe Flash plugin. Either Python 2.4.x or 2.5.x is required. Python 2.5.1 is recommended for new installations. The open source framework with open source server stack is downloadable from the community site in a self-installing packager.

Supported operating systems

* Windows XP and Server 2003
* Ubuntu 6.10/7.04 (server or desktop)
* Fedora Core 6
* RedHat Enterprise Linux 4/5
* CentOS 4 / 5

References

External links

* [http://snaplogic.org Project web site]
* [http://blog.snaplogic.org Project blog]
* [http://alexfletcher.typepad.com/all_bets_off/2007/04/snaplogic_innov.html Open Source Unleashed blog]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Data integration — involves combining data residing in different sources and providing users with a unified view of these data.[1] This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge… …   Wikipedia

  • Open source software development — is the process by which open source software (or similar software whose source code is publicly available) is developed. These are software products “available with its source code and under an open source license to study, change, and improve… …   Wikipedia

  • Open source software — (OSS) began as a marketing campaign for free software [cite web archiveurl=http://web.archive.org/web/20060423094434/www.opensource.org/advocacy/faq.html title=Frequently Asked Questions |publisher=Open Source Initiative archivedate=2006 04 23… …   Wikipedia

  • Open science data — is a type of Open data focussed on publishing observations and results of scientific activities available for anyone to analyze and reuse. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet… …   Wikipedia

  • Open-source architecture — (OSArc) is an emerging paradigm[citation needed]describing new procedures for the design, construction and operation of buildings, infrastructure and spaces. Drawing from references as diverse as open source culture, avant garde architectural… …   Wikipedia

  • Open-source enterprise architecture tools — are a class of enterprise architecture tool that are licensed such that they can be freely used, extended and modified by anyone. Traditionally, enterprise architecture tools are proprietary based and require a license and sometimes a support… …   Wikipedia

  • Open Source Job Scheduler — Developer(s) Software und Organisations Service GmbH Stable release 1.3.7 / April 23, 2010; 18 months ago (2010 04 23) …   Wikipedia

  • Open Source Business Alliance — Die Open Source Business Alliance (OSBA) ging aus der Fusion des Lisog e.V. und des LIVE Linux Verbandes hervor. Die Fusion wurde am 21. und 22. Juli 2011 durch die jeweiligen Mitgliederversammlungen beschlossen. Die neue Organisation verfügt nun …   Deutsch Wikipedia

  • Open-source software — The logo of the Open Source Initiative Open source software (OSS) is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license… …   Wikipedia

  • Open Source Business Intelligence — L OSBI, acronyme de Open Source Business Intelligence, regroupe l ensemble des solutions et techniques liées au décisionnel et dont le modèle s appuie sur l Open Source. Ce concept, malgré la ressemblance, n a aucun rapport avec l Open Source… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”