Python Pipelines

Python Pipelines

Python Pipelines is a cross-platform implementation of Hartmann pipeline, written in Python.

Pipelines lets you solve a complex problem by breaking it up into a series of smaller, less complex programs. These simple programs, also called stages, can then be hooked together to get the results you want. The output resulting from a stage is the input to the next stage. A series of stages is called a pipeline. Each stage consists of a stage and its operands. Pipelines has many built-in stages; you may add your own written in Python or pseudo-Rexx.

Stages may have more than one input and/or output stream; these streams may be connected to other stages in no particular order.

Pipelines is a superset of pipes as found in Unix shells.

Python Pipelines may be invoked from a command prompt or incorporated into another Python program as an imported module.

In CMS Pipelines users may write their own stages using the Rexx programming language. Due to the nature of the 'dispatcher' (each stage is a co-routine) these program are more complex than they could be.

Python Pipeline supports coding the IBM way (in 'pseudo-Rexx'), which makes it somewhat easy to incorporate legacy code or write new stages using a known technology.

Ideally, however, Python Pipeline users learn to write their own stages in Python, using a much simpler approach (each stage has a run() method that is invoked as many times as needed). Seven or more lines of Rexx are thus replaced by two lines of Python.

Documentation

The best and most comprehensive documentation is in http://vm.marist.edu/~pipeline/pipeline.pdf

Examples

Invoking the program from a command prompt:python pipe.py "(end ?) < input.txt | A: locate /Hello/ | > found.txt ? A: | > notfound.txt"The vertical bar separates stages; the ? separates pipelines.

The first stage (<) reads records from file input.txt, passing them one at a time to the second stage (locate).

The label A: tells the pipe processor that locate has 2 outputs. locate passes records that contain Hello to its primary output and the others to its secondary output. Primary output records flow to the third stage (>) where they are written to file found.txt.

Secondary output records flow to the next pipeline, entering at the label A:, thence to the 4th stage (>) where they are written to file notfound.txt.

A minimal REXX Stage written for CMS Pipelines

This stage passes each input record to the output:/* PASSRECORD REXX */signal on error do forever 'PEEKTO record' /* process and/or test self.record and/or test self.RC */ 'OUTPUT' record 'READTO' end error: exit (RC * (RC /= 12 & RC /= 8))

The equivalent in Python for Python Pipelines

import stageclass PassRecord(stage.Stage): def run(self, record): self.output(record)

Why is this so much simpler than the REXX for CMS version?

In CMS Pipelines each stage is a program. All stage programs are started, and run either to termination or until they issue a pipeline command (such as PEEKTO, READTO, OUTPUT, ...). The pipeline dispatcher receives control when such a command is issued, and decides which stage to resume next. Stages therefore require state memory, looping, decision making.

In Python Pipelines streams are connected directly. A record container is passed from stage 1 to stage 2 to stage 3 by a series of method calls. Normally the record container will reach a stage that "consumes" it by returning control back thru the prior stages, eventually returning control to a stage that sources another record container or else to the start of the pipeline which terminates that pipeline.

A record container is an object that holds the record and related information such as eof status.

Each stage that act as an input driver is started in a thread.

Thus there is no need for state memory or looping or testing or suspending within a stage. Any of these can be added to a stage, but only as needed.

Each Python Pipelines stage is an instance of a class with a rich set of methods for dealing with various situations.

External links

* [http://code.google.com/p/python-pipelines/ Python Pipelines Project - Pipelines on your PC]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Hartmann pipeline — Pipelines Paradigm(s) Dataflow programming Appeared in 1986 Designed by John P. Hartmann Developer IBM …   Wikipedia

  • Pipeline (Unix) — In Unix like computer operating systems, a pipeline is the original software pipeline : a set of processes chained by their standard streams, so that the output of each process ( stdout ) feeds directly as input ( stdin ) of the next one. Each… …   Wikipedia

  • Open Source Data Integration — The Open Source Data Integration framework from the [http://snaplogic.org SnapLogic] project [cite web|url=http://www.snaplogic.org|title= Open Source Data Integration Framework] is an open source framework for enterprise scale data integration.… …   Wikipedia

  • VisTrails — Infobox Software name = VisTrails caption = collapsible = author = developer = University of Utah released = latest release version = 1.1 latest release date = May 16, 2008 latest preview version = latest preview date = frequently updated =… …   Wikipedia

  • Nördlicher Felsenpython — (Python sebae) Systematik Unterordnung: Schlangen (Serpentes) Überfamilie …   Deutsch Wikipedia

  • Insight Segmentation and Registration Toolkit — Infobox Software name = ITK caption = ITK Logo developer = Insight Software Consortium latest release version = 3.8.0 latest release date = July 30, 2008 operating system = Cross platform genre = Development Library license =… …   Wikipedia

  • List of programming languages by category — Programming language lists Alphabetical Categorical Chronological Generational This is a list of programming languages grouped by category. Some languages are listed in multiple categories. Contents …   Wikipedia

  • Cinema 4D —  Ne doit pas être confondu avec Cinéma 4 D. CINEMA 4D …   Wikipédia en Français

  • Model–view–controller — A general representation of the MVC design pattern. Model view controller concept. The solid line represents a direct as …   Wikipedia

  • ARM architecture — This article is about a computer processor architecture. For other uses, see ARM (disambiguation). Logo ARM Designer ARM Holdings Bits …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”