Byzantine fault tolerance

Byzantine fault tolerance

Byzantine fault tolerance is a sub-field of error tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem.

The object of Byzantine fault tolerance is to be able to defend against a "Byzantine failure", in which a component of some system not only behaves erroneously, but also fails to behave consistently when interacting with multiple other components. Correctly functioning components of a Byzantine fault tolerant system will be able to reach the same group decisions assuming there aren't too many Byzantine faulty components.

Byzantine failures

A Byzantine fault is an arbitrary fault that occurs during the execution of an algorithm by a distributed system. It encompasses those faults that are commonly referred to as "crash failures" and "send and omission failures". When a Byzantine failure has occurred, the system may respond in any unpredictable way, unless it is designed to have Byzantine fault tolerance.

These arbitrary failures may be loosely categorized as follows:
*a failure to take another step in the algorithm, also known as a crash failure;
*a failure to correctly execute a step of the algorithm; and
*arbitrary execution of a step other than the one indicated by the algorithm.

For example, if the output of one function is the input of another, then small round-off errors in the first function can produce much larger errors in the second. If the second function were fed into a third, the problem could grow even larger, until the values produced are worthless. Another example is in compiling source code. One minor syntactical error early on in the code can produce large numbers of perceived errors later, as the compiler gets out-of-phase with the lexical and syntactic information in the source program.

Steps are taken by processes, the abstractions that execute the algorithms.vague|October 2008 A faulty process is one that at some point exhibits one of the above failures. A process that is not faulty is correct.

The Byzantine failure assumption models real-world environments in which computers and networks may behave in unexpected ways due to hardware failures, network congestion and disconnection, as well as malicious attacks. Byzantine failure-tolerant algorithms must cope with such failures and still satisfy the specifications of the problems they are designed to solve. Such algorithms are commonly characterized by their resilience "t", the number of faulty processes with which an algorithm can cope.

Many classic agreement problems, such as the Byzantine Generals' Problem, have no solution unless "t" < "n" / 3, where "n" is the number of processes in the system.

The Two Generals' Problem is a specific case which assumes that processes are reliable but communication between processes is not reliable.

Origin

"Byzantine" refers to the Byzantine Generals' Problem, an agreement problem in which generals of the Byzantine Empire's army must decide unanimously whether to attack some enemy army. The problem is complicated by the geographic separation of the generals, who must communicate by sending messengers to each other, and by the presence of traitors amongst the generals. These traitors can act arbitrarily in order to achieve the following aims: trick some generals into attacking; force a decision that is not consistent with the generals' desires, e.g. forcing an attack when no general wished to attack; or confusing some generals to the point that they are unable to make up their minds. If the traitors succeed in any of these goals, any resulting attack is doomed, as only a concerted effort can result in victory.

Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a unanimous agreement on their strategy. Note that if the source general is correct, all loyal generals must agree upon that value. Otherwise, the choice of strategy agreed upon is irrelevant.

olutions

Several solutions were originally described by Lamport, Shostak, and Pease in 1982. They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where Loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is Loyal. Roughly speaking, the Generals vote by treating each others' orders as votes.

*One solution considers scenarios in which messages may be forged, but which will be "Byzantine-fault-tolerant" as long as the number of traitorous generals does not equal or exceed one third. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the 1 Commander + 2 Lieutenants problem cannot be solved if the Commander is traitorous. The reason is, if we have three commanders, A, B, and C, and A is the traitor: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it isn't necessarily A - the other commander could have forged the message purportedly from A. It can be shown that if "n" is the number of generals in total, and "t" is the number of traitors in that "n", then there are solutions to the problem only when "n" is greater than or equal to 3"t" + 1.

*A second solution requires unforgeable signatures (in modern computer systems, this may be achieved through public-key cryptography), but maintains Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals.

*Also presented is a variation on the first two solutions allowing Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.

ee also

*Peer-to-peer
*Atomic commit

References

*cite journal|author=L. Lamport, R. Shostak, and M. Pease|url=http://research.microsoft.com/users/lamport/pubs/byz.pdf|title=The Byzantine Generals Problem|journal=ACM Trans. Programming Languages and Systems|volume=4|issue=3|month=July|year=1982|pages=382–401|doi=10.1145/357172.357176
*cite journal|url=http://www.pmg.lcs.mit.edu/~castro/osdi99_html/osdi99.html|title=Practical Byzantine Fault Tolerance|author=Castro, Miguel and Barbara Liskov|journal=Operating Systems Design and Implementation|year=1999

External links

* [http://oceanstore.cs.berkeley.edu/ Ocean Store] replicates data with a Byzantine fault tolerant commit protocol.
* [http://citeseer.ist.psu.edu/malkhi98byzantine.html Byzantine Quorum Systems] Quorum systems for Byzantine-fault tolerant replication.
* [http://www.pmg.lcs.mit.edu/bft/ Practical Byzantine Fault Tolerance]

Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Fault tolerance — Tolérance aux pannes Le concept de tolérance aux pannes se réfère à une méthode de conception d un système de telle façon qu il puisse continuer à fonctionner, potentiellement de manière réduite, au lieu de tomber complètement en panne dès que l… …   Wikipédia en Français

  • Fault-tolerant system — This article contains specific implementations of fault tolerant systems. For general theory, see fault tolerant design. Fault tolerance or graceful degradation is the property that enables a system (often computer based) to continue operating… …   Wikipedia

  • Byzantine (disambiguation) — The word Byzantine may refer to:Topics directly related to the Byzantine Empire * A citizen of The Byzantine Empire, or native Greek during the Middle Ages (see Byzantine Greeks). * List of Byzantine emperors, of the late Roman Empire, called… …   Wikipedia

  • Tolerance aux pannes — Tolérance aux pannes Le concept de tolérance aux pannes se réfère à une méthode de conception d un système de telle façon qu il puisse continuer à fonctionner, potentiellement de manière réduite, au lieu de tomber complètement en panne dès que l… …   Wikipédia en Français

  • Tolérance aux fautes — Tolérance aux pannes Le concept de tolérance aux pannes se réfère à une méthode de conception d un système de telle façon qu il puisse continuer à fonctionner, potentiellement de manière réduite, au lieu de tomber complètement en panne dès que l… …   Wikipédia en Français

  • Modèle Byzantine Altruistic Rational — Le modèle Byzantine Altruistic Rational (qui signifie en Anglais « byzantin, altruiste, rationnel », plus communément appelé modèle BAR) est un modèle mathématique de sécurité informatique, utilisé dans les systèmes distribués afin de… …   Wikipédia en Français

  • Intrusion Tolerance — is a Fault tolerant design approach to defending information systems against malicious attack. Abandoning the conventional aim of preventing all intrusions, intrusion tolerance instead calls for triggering mechanisms that prevent intrusions from… …   Wikipedia

  • Outline of the Byzantine Empire — See also: Index of Byzantine Empire related articles The following outline is provided as an overview of and topical guide to the Byzantine Empire: Contents 1 Nature of the Byzantine Empire 2 Geography of the Byzantine Empire 3 Government and pol …   Wikipedia

  • Derogatory use of "Byzantine" — The term Byzantine was first applied to the eastern Roman Empire by historians in the 16th century, decades after the Fall of Constantinople to the forces of Mehmed II of the Ottoman Empire on 29 May, 1453. The term is used to describe any work,… …   Wikipedia

  • Tolérance aux pannes — Le concept de tolérance aux pannes se réfère à une méthode de conception d un système de telle façon qu il puisse continuer à fonctionner, potentiellement de manière réduite (en mode dégradé), au lieu de tomber complètement en panne lorsque l un… …   Wikipédia en Français

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”