- Python Pipelines
Python Pipelines is a cross-platform implementation of
Hartmann pipeline , written in Python.Pipelines lets you solve a complex problem by breaking it up into a series of smaller, less complex programs. These simple programs, also called stages, can then be hooked together to get the results you want. The output resulting from a stage is the input to the next stage. A series of stages is called a pipeline. Each stage consists of a stage and its operands. Pipelines has many built-in stages; you may add your own written in Python or pseudo-
Rexx .Stages may have more than one input and/or output stream; these streams may be connected to other stages in no particular order.
Pipelines is a superset of pipes as found in
Unix shell s.Python Pipelines may be invoked from a command prompt or incorporated into another Python program as an imported module.
In CMS Pipelines users may write their own stages using the Rexx programming language. Due to the nature of the 'dispatcher' (each stage is a co-routine) these program are more complex than they could be.
Python Pipeline supports coding the IBM way (in 'pseudo-Rexx'), which makes it somewhat easy to incorporate legacy code or write new stages using a known technology.
Ideally, however, Python Pipeline users learn to write their own stages in Python, using a much simpler approach (each stage has a run() method that is invoked as many times as needed). Seven or more lines of Rexx are thus replaced by two lines of Python.
Documentation
The best and most comprehensive documentation is in http://vm.marist.edu/~pipeline/pipeline.pdf
Examples
Invoking the program from a command prompt:The vertical bar separates stages; the ? separates pipelines.
The first stage (<) reads records from file input.txt, passing them one at a time to the second stage (locate).
The label A: tells the pipe processor that locate has 2 outputs. locate passes records that contain Hello to its primary output and the others to its secondary output. Primary output records flow to the third stage (>) where they are written to file found.txt.
Secondary output records flow to the next pipeline, entering at the label A:, thence to the 4th stage (>) where they are written to file notfound.txt.
A minimal REXX Stage written for CMS Pipelines
This stage passes each input record to the output:
The equivalent in Python for Python Pipelines
Why is this so much simpler than the REXX for CMS version?
In CMS Pipelines each stage is a program. All stage programs are started, and run either to termination or until they issue a pipeline command (such as PEEKTO, READTO, OUTPUT, ...). The pipeline dispatcher receives control when such a command is issued, and decides which stage to resume next. Stages therefore require state memory, looping, decision making.
In Python Pipelines streams are connected directly. A record container is passed from stage 1 to stage 2 to stage 3 by a series of method calls. Normally the record container will reach a stage that "consumes" it by returning control back thru the prior stages, eventually returning control to a stage that sources another record container or else to the start of the pipeline which terminates that pipeline.
A record container is an object that holds the record and related information such as eof status.
Each stage that act as an input driver is started in a thread.
Thus there is no need for state memory or looping or testing or suspending within a stage. Any of these can be added to a stage, but only as needed.
Each Python Pipelines stage is an instance of a class with a rich set of methods for dealing with various situations.
External links
* [http://code.google.com/p/python-pipelines/ Python Pipelines Project - Pipelines on your PC]
Wikimedia Foundation. 2010.