- Cascade failure
A cascade failure is a series of events on the
internet in which network traffic is severely impaired or halted, to or between larger sections of the internet, caused by failing or disconnected hardware or software. Somewhat similar to the more genericcascading failure — found in, for instance, electrical systems — the cascade failure can affect large groups of people and systems.Causes
The cause of a cascade failure is usually the overloading of a single, crucial
router or node. This causes the node to go down, even briefly, resulting inrouting of traffic to or through another (alternative) path.This alternative path, as a result, becomes overloaded, causing it to go down, and so on. It will also affect systems which depend on the node for regular operation.It can also be caused by taking a node down for maintenance or upgrades.
Symptoms
The symptoms of a cascade failure are easy to see:
packet loss and high network latency, not just to single systems, but to whole sections of a network or the internet. The high latency and packet loss is caused by the nodes that fail to operate due tocongestion collapse , which causes them to still be present in the network but without much or any useful communication going through them. As a result, routes can still be considered valid, without them actually providing communication.If enough routes go down because of a cascade failure, a complete section of the network or internet can become unreachable. Although undesired, this can help speed up the recovery from this failure as connections will time out, and other nodes will give up trying to establish connections to the section(s) that have become cut off, decreasing load on the involved nodes.
A common thing to see during a cascade failure is a walking failure, where sections go down, causing the next section to fail, after which the first section comes back up. This ripple can make several passes through the same sections or connecting nodes before stability is restored.
History
Cascade failures are a relatively recent development, with the massive increase in traffic and the high interconnectivity between systems and networks. The term was first applied in this context in the late 1990's by a Dutch IT professional and has slowly become a relatively common term for this kind of large-scale failure.
Example
As an example, let's overload a connecting node between a local
ISP and theirInternet backbone :Initially, the traffic that would normally go through the node is stopped. Systems and users get errors about not being able to reach hosts. Usually, the redundant systems of an ISP respond very quickly, choosing another path through a different backbone. The routing path through this alternative route is longer, with more hops and subsequently going through more systems that normally do not process the amount of traffic suddenly offered.This can cause one or more systems along the alternative route to go down, causing similar problems of their own.Also, related systems are affected in this case: as example, DNS resolution might fail and what would normally cause systems to be interconnected, might break connections that are not even directly involved in the actual systems that went down. This, in turn, may cause seemingly unrelated nodes to develop problems, that can cause another cascade failure all on its own.
See also
*
Chain reaction
*Congestion collapse References
* cite web
url = http://www.jaist.ac.jp/library/thesis/ks-master-2005/abstract/tmiyazak/abstract.pdf
title = Comparison of defense strategies for cascade breakdown on SF networks with degree correlations
author = Toshiyuki Miyazaki
date =2005-03-01
language = English
* cite web
url = http://redmondmag.com/columns/print.asp?EditorialsID=1000
title = (In)Secure Shell?
accessdate = 2007-09-08
author = Russ Cooper
date =2005-06-01
publisher = RedmondMag.com
language = English
* cite web
url = http://www.chds.us/?research/software&d=list
title = Cascade Net (simulation program)
accessdate = 2007-09-08
author = US Department of Homeland Security
date =2007-02-05
publisher = Center for Homeland Defense and Security
language = English
Wikimedia Foundation. 2010.