- Bulk synchronous parallel
The Bulk Synchronous Parallel computer is a model for designing parallelalgorithms. It serves a similar purpose to the
PRAM model. BSP differs from PRAM by not taking communication and synchronization for granted. An importantpart of analysing a BSP algorithm rests on quantifying the synchronisation and communication needed.The model
A BSP computer consists of processors connected by a communicationnetwork. Each processor has a fast local memory, and may followdifferent threads of computation.
A BSP computation proceeds in a series of global "supersteps". Asuperstep consists of three ordered stages:
# "Concurrent computation" : Several computations take place on every participating processor. Each process only uses values stored in the local memory of the processor. The computations are independent in the sense that they occur asynchronously of all the others.
# "Communication" : At this stage, the processes exchange data between themselves.
# "Barrier synchronisation" : When a process reaches this point (the "barrier"), it waits until all other processes have finished their communication actions.The figure below shows this in a diagrammatic form. The processes are notregarded as having a particular linear order (from left to right orotherwise), and may be mapped to processors in any way.
Communication
In many parallel programming systems, communications are considered atthe level of individual actions: sending and receiving a message, memoryto memory transfer, etc. This is difficult to work with, since thereare many simultaneous communication actions in a parallel program, andtheir interactions are typically complex. In particular, it isdifficult to say much about the time any single communication actionwill take to complete.
The BSP model considers communication actions "en masse". This hasthe effect that an upper bound on the time taken to communicate a set ofdata can be given. BSP considers all communication actions of asuperstep as one unit, and assumes all messages have a fixed size.
The maximum number of incoming or outgoing messages for a superstep isdenoted by h. The ability of a communication network todeliver data is captured by a parameter g, defined suchthat it takes time hg for a processor to deliverh messages of size 1.
A message of length m obviously takes longer to send than amessage of size 1. However, the BSP model does not make a distinctionbetween a message length of m or m messages oflength 1. In either case the cost is said to be mhg.
The parameter g is dependent on the following factors:
* The protocols used to interact within the communication network.
* Buffer management by both the processors and the communication network.
* The routing strategy used in the network.
* The BSP runtime system.A value for g is, in practice, determined empirically foreach parallel computer. Note that g is not the normalisedsingle-word delivery time, but the single-word delivery time undercontinuous traffic conditions.
Barriers
On most of today's architectures, barrier synchronisation is oftenexpensive, so should be used sparingly. However, future architecturedevelopments may make them much cheaper. The cost of barriersynchronisation is influenced by a couple of issues:
# The cost imposed by the variation in the completion time of the participating concurrent computations. Take the example where all but one of the processes have completed their work for this superstep, and are waiting for the last process, which still has a lot of work to complete. The best that an implementation can do is ensure that each process works on roughly the same problem size.
# The cost of reaching a globally-consistent state in all of the processors. This depends on the communication network, but also on whether there is special-purpose hardware available for synchronising, and on the way in which interrupts are handled by processors.The cost of a barrier synchronisation is denoted by l. Inpractice, a value of l is determined empirically.
Barriers are potentially costly, but have a number of attractions. Theydo not introduce the possibility of deadlock or livelock, since barriersdo not create circular data dependencies. Therefore tools to detect and deal with them areunnecessary. Barriers also permit novel forms of fault tolerance.
The Cost of a BSP algorithm
The cost of a superstep is determined as the sum of three terms; thecost of the longest running local computation, the cost of globalcommunication between the processors, and the cost of the barriersynchronisation at the end of the superstep. The cost of one superstepfor p processors:
max_{i = 1}^{p}(w_i) + max_{i=1}^{p}(h_i g) + l
where w_i is the cost for the local computation in processi, and h_i is the number of messages sent orreceived by process i. Note that homogeneous processorsare assumed here. It is more common for the expression to be written asw + hg + l where w and h aremaxima. The cost of the algorithm then, is the sum of the costs of eachsuperstep.
W + Hg + Sl = sum_{s=1}^{S}w_s + g sum_{s=1}^{S}h_s + Sl
where S is the number of supersteps.
W, H, and S are usually modelledas functions, that vary with problem size. These three characteristicsof a BSP algorithm are usually described in terms of asymptoticnotation, e.g. H in O(n/p).
References
* D.B. Skillicorn, Jonathan Hill, W. F. McColl, [ftp://ftp.comlab.ox.ac.uk/pub/Documents/techpapers/Jonathan.Hill/SkillHillMcColl_QA.ps.gz Questions and answers about BSP] (1996)
ee also
*
Computer cluster
*Concurrent computing
* Concurrency
*Grid computing
*Parallel computing
*ScientificPython
*LogP machine External links
* [http://www.bsp-worldwide.org/ BSP Worldwide]
* [http://www.bsp-worldwide.org/implmnts/oxtool/papers.html BSP related papers]
* [http://web.comlab.ox.ac.uk/oucl/work/bill.mccoll/oparl.html WWW Resources on BSP Computing]
Wikimedia Foundation. 2010.