- Flow-based programming
In
computer science , flow-based programming (FBP) is aprogramming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections bymessage passing . These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.The FBP development approach views an application not as a single, sequential, process, which starts at a point in time, and then does one thing at a time until it is finished, but as a network of asynchronous processes communicating by means of streams of structured data chunks, called "information packets" (IPs). In this view, the focus is on the application data and the transformations applied to it to produce the desired outputs. The network is defined externally to the processes, as a list of connections which is interpreted by a piece of software, usually called the "scheduler".
The processes communicate by means of fixed-capacity connections. A connection is attached to a process by means of a port, which has a name agreed upon between the process code and the network definition. More than one process can execute the same piece of code. At any point in time, a given IP can only be "owned" by a single process, or be in transit between two processes. Ports may either be simple, or array-type, as used e.g. for the input port of the Collate component described below. It is the combination of ports with asynchronous processes that allows many long-running primitive functions of data processing, such as Sort, Merge, Summarize, etc., to be supported in the form of software black boxes.
The network definition is usually diagrammatic, and is converted into a connection list in some lower-level language or notation. FBP is thus a
visual programming language at this level. More complex network definitions have a hierarchical structure, being built up from subnets with "sticky" connections.FBP has much in common with the Linda [N. Carriero and D. Gelernter, "Linda in Context", Communications of the ACM, Vol. 32, No. 4, April 1989] language in that it is, in Gelernter and Carriero's terminology, a "coordination language" [N. Carriero and D. Gelernter, "Coordination Languages and their Significance", Communications of the ACM, Vol. 35, No. 2, February 1992] : it is essentially language-independent. Indeed, given a scheduler written in a sufficiently low-level language, components written in different languages can be linked together in a single network. FBP thus lends itself to the concept of
domain-specific language s or "mini-languages".FBP exhibits "data coupling", described in the article on coupling as the loosest type of coupling between components. The concept of
loose coupling is in turn related to that ofService-oriented architecture s, and FBP fits a number of the criteria for such an architecture, albeit at a more fine-grained level than most examples of this architecture.History
FBP was invented by J. Paul Morrison in the early 1970s, and an early implementation of this technology has been in continuous production use at a major Canadian bank since that time.
FBP at its inception was strongly influenced by some IBM simulation languages of the period, in particular
GPSS , but its roots go all the way back to Conway's seminal paper on what he calledcoroutines . [M.E. Conway, "Design of a separable transition-diagram compiler", Communications of the ACM, Vol. 6, No. 7, July 1963]FBP has undergone a number of name changes over the years: the original implementation was called AMPS (Advanced Modular Processing System). A number of the basic concepts were put into the public domain by IBM, by means of a Technical Disclosure Bulletin in 1971, using a very general title. [J. Paul Morrison, "Data Responsive Modular, Interleaved Task Programming System," IBM Technical Disclosure Bulletin, Vol. 13, No. 8, 2425-2426, January 1971] An article describing its concepts and experience using it was published in 1978 in the
IBM Systems Journal under the name DSLM. [J. Paul Morrison, "Data Stream Linkage Mechanism," IBM Systems Journal Vol. 17, No. 4, 1978] A second implementation was done as a joint project of IBM Canada and IBM Japan, under the name "Data Flow Development Manager" (DFDM), and was briefly marketed in Japan in the late '80s under the name "Data Flow Programming Manager".Generally the concepts were referred to within IBM as "Data Flow", but this term was felt to be too general, and eventually the name Flow-Based Programming was adopted, and a book with that title was published in 1994.J. Paul Morrison, "Flow-Based Programming:A New Approach to Application Development," van Nostrand Reinhold, 1994, ISBN 0-442-01771-5] It is now out of print.
The late IBM architect,
Wayne Stevens , wrote several articles describing and supporting the FBP concept, and included material about it in several of his books. [W.P. Stevens, "How Data Flow can Improve Application Development Productivity," IBM System Journal, Vol. 21, No. 2, 1982] [W.P. Stevens, "Using Data Flow for Application Development", Byte, June 1985] [W.P. Stevens, "Software Design - Concepts and Methods", Practical Software Engineering Series, Ed. Allen Macro, Prentice Hall, 1990, ISBN 0-13-820242-7]As of 2006 several companies were marketing tools based on FBP concepts, among them: Daxisoft Corporation [cite web|author=Daxisoft Corporation|title=Flow Based XML IDE (DaxiXML)|url=http://www.daxisoft.com/products.html|accessdate=2006-07-10] , Trelliswerk LLC [cite web|author=Trelliswerk LLC|title=About Flow-Based Programming|url=http://www.trelliswerk.com/Background.htm|accessdate=2006-07-10] , and Proto Software, Inc. [cite web|author=Proto Software Inc.|title=Proto components: Reuse that actually works|url=http://www.protosw.com/products/financial#reuse|accessdate=2006-07-25] IBM also sells a tool for general data transformation called
DataStage which combines FBP with parallel processing.Concepts
The following diagram shows the major entities of an FBP diagram (apart from the IPs). Such a diagram can be converted directly into a list of connections, which can then be executed by an appropriate engine (software or hardware).
A, B and C are processes executing code components. O1, O2, and the two INs are ports connecting the connections M and N to their respective processes. It is permitted for processes B and C to be executing the same code, so each process must have its own set of working storage, control blocks, etc. Whether or not they do share code, B and C are free to use the same port names, as port names only have meaning within the components referencing them (and at the network level, of course).
M and N are what are often referred to as "bounded buffers", and have a fixed capacity in terms of the number of IPs that they can hold at any point in time.
The concept of "ports" is what allows the same component to be used at more than one place in the network. In combination with a parametrization capability, called Initial Information Packets (IIPs), ports provide FBP with a component reuse capability, making FBP a component-based architecture. FBP thus exhibits what
Nate Edwards ofIBM Research has termedconfigurable modularity .Information Packets or IPs are allocated in what might be called "IP space" (just as Linda's tuples are allocated in "tuple space"), and have a well-defined lifetime until they are disposed of and their space is reclaimed - in FBP this must be an explicit action on the part of an owning process. IPs traveling across a given connection (actually it is their "handles" that travel) constitute a "stream", which is generated and consumed asynchronously - this concept thus has similarities to the lazy cons concept described in the 1976 article by Friedman and Wise. [D.P. Friedman and D.S. Wise, "CONS should not evaluate its arguments," Automata, Languages and Programming, Edinburgh University Press, Edinburgh, 1976]
IPs are usually structured chunks of data - some IPs, however, may not contain any real data, but are used simply as signals. An example of this is "bracket IPs", which can be used to group data IPs into sequential patterns within a stream, called "substreams". Substreams may in turn be nested. IPs may also be chained together to form "IP trees", which travel through the network as single objects.
The system of connections and processes described above can be "ramified" to any size. During the development of an application, monitoring processes may be added between pairs of processes, processes may be "exploded" to subnets, or simulations of processes may be replaced by the real process logic. FBP therefore lends itself to rapid prototyping.
This is really an
assembly line image of data processing: the IPs travelling through a network of processes may be thought of as widgets travelling from station to station in an assembly line. "Machines" may easily be reconnected, taken off line for repair, replaced, and so on. Oddly enough, this image is very similar to that ofunit record equipment that was used to process data before the days of computers, except that decks of cards had to be hand-carried from one machine to another.Implementations of FBP may be non-preemptive or preemptive - the earlier implementations tended to be non-preemptive (mainframe and C language), whereas the latest Java implementation uses Java Thread class and is preemptive.
Examples
"Telegram Problem"
FBP components often form complementary pairs. This example uses two such pairs. The problem described seems very simple as described in words, but in fact is surprisingly hard to do using conventional procedural logic. The task, called the "Telegram Problem", originally described by
Peter Naur , is to write a program which accepts lines of text and generates output lines of a different length, without splitting any of the words in the text (we assume no word is longer than the size of the output lines).In conventional logic, the programmer has to realize that neither the input nor the output end can be used as the "main line". In FBP, on the other hand, the problem description itself suggests a solution:
* "words" are mentioned explicitly in the description of the problem, so it is reasonable for the designer to treat words as information packets (IPs)
* there is no "main line" in FBP, so the programmer is not tempted to force a subpattern of the solution to be the main line.Here is the most natural solution in FBP (there is no single "correct" solution in FBP, but this seems like a natural fit):
As mentioned above, Initial Information Packets (IIPs) can be used to specify parametric information such as the desired output record length (required by the rightmost two components), or file names. IIPs are data chunks associated with a port in the network definition which become "normal" IPs when a "receive" is issued for the relevant port.
Batch update
This type of program involves passing a file of "details" (changes, adds and deletes) against a "master file", and producing (at least) an updated master file, and one or more reports. Update programs are generally quite hard to code using synchronous, procedural code, as two (sometimes more) input streams have to be kept synchronized, even though there may be masters without corresponding details, or vice versa.
In FBP, a reusable component (Collate), based on the unit record idea of a Collator, makes writing this type of application much easier as Collate merges the two streams and inserts bracket IPs to indicate grouping levels, significantly simplifying the downstream logic. Suppose that one stream ("masters" in this case) consists of IPs with key values of 1, 2 and 3, and the second stream IPs ("details") have key values of 11, 12, 21, 31, 32, 33 and 41, where the first digit corresponds to the master key values. Using bracket characters to represent "bracket" IPs, the collated output stream will be as follows:
( m1 d11 d12 ) ( m2 d21 ) ( m3 d31 d32 d33 ) (d41)
As there was no master with a value of 4, the last group consists of a single detail (plus brackets).
The structure of the above stream can be described succinctly using a BNF-like notation such as
{ ( [m] d* ) }*
Collate is a reusable black box which only needs to know where the control fields are in its incoming IPs (even this is not strictly necessary as transformer processes can be inserted upstream to place the control fields in standard locations), and can in fact be generalized to any number of input streams, and any depth of bracket nesting. Collate uses an array-type port for input, allowing a variable number of input streams.
Simple interactive network
In this general schematic, requests (transactions) coming from users enter the diagram at the upper left, and responses are returned at the lower left. The "back ends" (on the right side) communicate with systems at other sites, e.g. using CORBA,
MQSeries , etc. The cross-connections represent requests that do not need to go to the back ends, or requests that have to cycle through the network more than once before being returned to the user.As different requests may use different back-ends, and may require differing amounts of time for the back-ends (if used) to process them, provision must be made to relate returned data to the appropriate requesting transactions, e.g. hash tables or caches.
The above diagram is schematic in the sense that the final application may contain many more processes: processes may be inserted between other processes to manage caches, display connection traffic, monitor throughput, etc. Also the blocks in the diagram may represent "subnets" - small networks with one or more open connections.
Comparison with other paradigms and methodologies
Jackson Structured Programming (JSP)
This methodology assumes that a program must be structured as a single procedural hierarchy of subroutines. Its starting point is to describe the application as a set of "main lines", based on the input and output data structures. One of these "main lines" is then chosen to drive the whole program, and the others are required to be "inverted" to turn them into subroutines (hence the name "Jackson inversion"). This sometimes results in what is called a "clash", requiring the program to be split into multiple programs or coroutines. When using FBP, this inversion process is not required, as every FBP component can be considered a separate "main line".
FBP and JSP share the concept of treating a program (or some components) as a
parser of an input stream. The FBP book contains a discussion of how the concept of push-down automata may be used to design components (Chapter 24). It describes how a stack of controlling IPs may be used to control nested substreams in an FBP data stream.Applicative programming
W.B. Ackerman defines an applicative language as one which does all of its processing by means of operators applied to values. [W.B. Ackerman, "Data Flow Languages", Proceedings National Computer Conference, pp. 1087-1095, 1979] The earliest known applicative language was LISP.
An FBP component can be regarded as a function transforming its input stream(s) into its output stream(s). These functions are then combined to make more complex transformations, as shown here:
If we label streams, as shown, with lower case letters, then the above diagram can be represented succinctly as follows:
c = G(F(a),F(b));
Just as in functional notation F can be used twice because it only works with values, and therefore has no side effects, in FBP two instances of a given component may be running concurrently with each other, and therefore FBP conmponents must not have side-effects either. Functional notation could clearly be used to represent at least a part of an FBP network.
The question then arises whether FBP components can themselves be expressed using functional notation. W.H. Burge showed how stream expressions can be developed using a recursive, applicative style of programming, but this work was in terms of (streams of) atomic values. [W.H. Burge, "Recursive Programming Techniques", Addison-Wesley, Reading, MA, 1975] In FBP, it is necessary to be able to describe and process structured data chunks (FBP IPs). In the FBP book, a notation is added for accessing the fields of an IP, and an operator, called the "mini-constructor" (μ), based on a similar function in the Vienna Definition Language, for creating an IP from a set of (perhaps modified) field values and identifiers.
Furthermore, most applicative systems assume that all the data is available in memory at the same time, whereas FBP applications need to be able to process long-running streams of data while still using finite resources. Friedman and Wise suggested a way to do this by adding the concept of "lazy cons" to Burge's work. This removed the requirement that both of the arguments of "cons" be available at the same instant of time. "Lazy cons" does not actually build a stream until both of its arguments are realized - before that it simply records a "promise" to do this. This allows a stream to be dynamically realized from the front, but with an unrealized back end. The end of the stream stays unrealized until the very end of the process, while the beginning is an ever-lengthening sequence of items.
In the FBP book (Chapter 25), these ideas are combined to allow the expression of some quite complex component logic using applicative notation.
Linda
Many of the concepts in FBP seem to have been discovered independently in different systems over the years. Linda, mentioned above, is one such. Chapter 27 of the FBP book goes into some detail about similarities and differences, but probably the major difference is that, in Linda, data is accessed associatively, whereas in FBP, IPs arriving at a particular input port are retrieved sequentially. FBP's IPs are very similar to Linda's
tuple s. The difference between the two techniques is illustrated by the Linda "school of piranhas" load balancing technique - in FBP, this requires an extra "load balancer" component which routes requests to the component in a list which has the smallest number of IPs waiting to be processed. Clearly FBP and Linda are closely related, and one could easily be used to simulate the other.Object-oriented programming
An object in
OOP can be described as a semi-autonomous unit comprising both information and behaviour. Objects communicate by means of "method calls", which are essentially subroutine calls, done indirectly via the class to which the receiving object belongs. The object's internal data can only be accessed by means of method calls, so this is a form ofinformation hiding or "encapsulation". Encapsulation, however, predatesOOP -David Parnas wrote one of the seminal articles on it in the early 70s [D. Parnas, "On the criteria to be used in decomposing systems into modules", Communications of the ACM, Vol. 5, No. 12, Dec. 1972, pp. 1053-8] - and is a basic concept in computing. Encapsulation is the very essence of an FBP component, which may be thought of as ablack box , performing some conversion of its input data into its output data. In FBP, part of the specification of a component is the data formats and stream structures that it can accept, and those it will generate. This constitutes a form ofdesign by contract . In addition, the data in an IP can only be accessed directly by the currently owning process. Encapsulation can also be implemented at the network level, by having outer processes protect inner ones.A paper by C. Ellis and S. Gibbs distinguishes between
active objects and passive objects. [C. Ellis and S. Gibbs, "Active Objects: Realities and Possibilities", in "Object-Oriented Concepts, Databases, and Applications", eds. W. Kim and F.H. Lochovsky, ACM Press, Addison-Wesley, 1989] Passive objects comprise information and behaviour, as stated above, but they cannot determine the "timing" of this behaviour. Active objects on the other hand can do this. In their article Ellis and Gibbs state that active objects have much more potential for the development of maintainable systems than do passive objects. An FBP application can be viewed as a combination of these two types of object, where FBP processes would correspond to active objects, while IPs would correspond to passive objects.Chapter 26 of the FBP book goes into more detail on the relationship between FBP and
OOP .ee also
*
Concurrent computing
*Dataflow programming
*Communicating Sequential Processes
*Active objects
*Actor model
*MapReduce References
External links
* [http://www.pervasivedatarush.com/ DataRush] : Dataflow framework for Java.
* [http://www.jpaulmorrison.com/fbp/index.shtml Flow-Based Programming]
* [http://sourceforge.net/projects/flow-based-pgmg JavaFBP] : Open source framework for Java and C#
* [http://reengineer.org/stevens/index2004.html#stevens Wayne Stevens]
* [http://lambda-the-ultimate.org/node/1210 "The new old or The 'Return' to Concurrency"] - (discussion on "Lambda the Ultimate" about FBP,Unix , Erlang,OmniMark ,Monad shell ,Kamaelia , etc.)
* [http://lambda-the-ultimate.org/node/193 "The right default: concurrent components with message passing"] - (discussion on "Lambda the Ultimate" about approaches to structuring programs)Articles
*cite journal
last = Razdow
first = Allen
year = 1997
month= December
title = Building Enterprise Data Refineries
journal = DMReview
url = http://www.dmreview.com/article_sub.cfm?articleId=689
accessdate = 2006-07-15*cite web
last = Mayer
first = Anthony
coauthors = S. McGough, M. Gulamali, L. Young, J. Stanton, S. Newhouse, J. Darlington
title = Meaning and Behaviour in Grid Oriented Components
publisher = London e-Science Centre, Imperial College of Science, Technology and Medicine
date = 2002
url = http://www.lesc.ic.ac.uk/iceni/pdf/Grid2002.pdf
format = PDF
accessdate = 2006-07-15*cite web
last = Black
first = Andrew
coauthors = J. Huang, R. Koster, J. Walpole, C. Pu
title = Infopipes: An abstraction for multimedia streaming
publisher = Springer-Verlag, Multimedia Systems
date = 2002
doi = 10.1007/s00530-002-0062-3
url = http://web.cecs.pdx.edu/~black/publications/Mms062%203rd%20try.pdf
format = PDF
accessdate = 2006-08-10*cite web
title = Infopipe is registered as a Trademark or Service Mark with the US Patent and Trademark Office
date = 1999
url = http://www.infopipe.com
format = HTML
accessdate = 2007-01-15*cite journal
last = Kra
first = David
year = 2004
month = Oct.
title = zSeries and iSeries servers in the grid domain
journal = IBM developerWorks (gridComputing)
url = http://www-128.ibm.com/developerworks/grid/library/gr-ziseries/
accessdate = 2006-07-13*cite web
last = Ludäscher
first = Bertram
coauthors = I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E.A. Lee, J. Tao, Y. Zhao
title = Scientific Workflow Management and the Kepler System
publisher = San Diego Supercomputer Center
date = September 2004; revised March 2005
url = http://users.sdsc.edu/~ludaesch//Paper/kepler-swf.pdf
format = PDF
accessdate = 2006-07-14*cite web
last = Bickle
first = Jerry
coauthors = K. Richardson, J. Smith
title = OMG Software Radio Specification Overview for Robotics
publisher = Object Management Group - Software-Based Communications
date = 2005
url = http://www.omg.org/docs/robotics/05-01-06.pdf
format = PDF
accessdate = 2006-07-15*cite web
last = Sanford
first = Leslie
title = C# MIDI Toolkit
url = http://www.codeproject.com/cs/media/MIDIToolkit.asp
format = HTML
year = 2004
month = February
accessdate = 2006-10-14*cite journal
last = Blažević
first = Mario
year = 2006
title = Streaming Component Combinators
journal = Proceedings of Extreme Markup Languages
url = http://www.idealliance.org/papers/extreme/Proceedings/html/2006/Blazevic01/EML2006Blazevic01.html
accessdate = 2006-11-09* cite book
last = Kauler
first = Barry
authorlink =
coauthors =
title = Flow Design for Embedded Systems, 2nd Edition
publisher = R&D Books/Miller Freeman
date = 1999
location =
pages =
url =
doi =
id = ISBN 0-87930-555-X* [http://patft.uspto.gov/netacgi/nph-Parser?patentnumber=5,204,965 U.S. Patent #5,204,965] (
1993-04-20 ) S.B. Guthery, P.S. Barth, D.R. Barstow, "Data processing system using stream stores".* cite web
author = B. Visscher
title = Bart's Operating System Structure (BOSS)
publisher = Delft University of Technology, Faculty of Information Technology and Systems, Section Knowledge Based Systems
date = 2001
url = http://www.kbs.twi.tudelft.nl/docs/MSc/2001/Visscher_Bart-Floris/thesis.pdf
format = PDF
accessdate = 2006-12-05* cite web
author = G. P. Staplin
title = Tcl Flow Based Programming (TFP)
url = http://www.xmission.com/~georgeps/documentation/software/TFP5.pdf
format = PDF
accessdate = 2006-12-05* cite web
author = J. Lowery
title = Flow-based Programming Library for Python (zflow)
url = http://cheeseshop.python.org/pypi/zflow/0.01
accessdate = 2006-12-05* cite web
author = W.M. Johnston, J.R.P. Hanna, R.J. Millar
title = Advances in dataflow programming languages
publisher = ACM Computing Surveys (CSUR)
year = 2004
month = March
volume = 36
issue = 1
url = http://portal.acm.org/citation.cfm?id=1013208.1013209
accessdate = 2006-12-05*cite web
last = Koster
first = Rainer
coauthors = A.P. Black, J. Huang, J. Walpole, C. Pu
title = Thread transparency in information flow middleware
publisher = ACM Digital Library: Software--Practice & Experience
volume = 33
issue = 4
year = 2003
month = April
url = http://portal.acm.org/citation.cfm?id=777886&dl=acm&coll=&CFID=15151515&CFTOKEN=6184618
accessdate = 2006-12-05*cite web
last = Stevenson
first = Tony
title = Review of "Flow-Based Programming"
publisher = PC Update, the magazine of Melbourne PC User Group, Australia
year = 1995
month = February
url =http://www.melbpc.org.au/pcupdate/9502/9502article7.htm
accessdate = 2006-12-06*cite web
title = About Flow-Based Programming
publisher = Trelliswerk - Company Background
url = http://www.trelliswerk.com/Background.htm
accessdate = 2006-12-06*cite web
title = Programmability Framework
publisher = Codito - Programmability Framework
url = http://www.codito.com/prodtech_framework.html
accessdate = 2006-12-06*cite web
title = Proto components: Reuse that actually works
publisher = Proto Software
url = http://www.protosw.com/products/financial#reuse
accessdate = 2006-12-06*cite web
title = Flow Based XML IDE
publisher = Daxisoft
url = http://www.daxisoft.com/products.html
accessdate = 2006-12-06*cite web
title = Talend
publisher = Talend.com
url = http://www.talend.com
accessdate = 2008-07-21*cite web
title = Dataflow Programming in Python
publisher = TheNSys
url = http://www.thensys.com/
year = 2006
accessdate = 2006-12-06*cite web
last = Lea
first = Doug
title = Composing Oneway Messages
url = http://g.oswego.edu/dl/cpj/s4.2.html
year = 2001
month = May
accessdate = 2006-12-06*cite web
last = Bowers
first = Shawn
coauthors = B. Ludäscher, A.H.H. Ngu, T. Critchlow
publisher = SciFlow '06
title = Enabling Scientific Workflow Reuse through Structured Composition of Dataflow and Control-Flow
url = http://daks.ucdavis.edu/~ludaesch/289F-SQ06/handouts/7-templates-frames-sciflow.pdf
accessdate = 2006-12-06
Wikimedia Foundation. 2010.