- Alarm management
Alarm management is the application of
human factors(or ergonomicsas the field is referred to outside the U.S.) along with instrumentation engineeringand systems thinkingto manage the designof an alarm systemto increase its usability. Most often the major usability problem is that there are too many alarms annunciated in a plant upset, commonly referred to as alarm flood, since it is so similar to a flood caused by excessive rainfall input with a basically fixed drainageoutput capacity. However, there can also be other problems with an alarm system such as poorly designed alarms, improperly set alarm points, ineffective annunciation, unclear alarm messages, etc.
Alarm Problem History
From their conception, large chemical, refining, power generation, and other processing plants required the use of a control system to keep the process operating successfully and producing products. Due to the fragility of the components as compared to the process, these control systems often required a control room to protect them form the elements and process conditions. In the early days of control rooms, they utilized what were referred to as "panel boards" which were loaded with control instruments and indicators. These were tied to sensors located in the process streams and on the outside of process equipment. The sensors relayed their information to the control instruments via 4-20 mA current loop in the form of twisted pair wiring. At first these systems merely yielded information, and a well-trained operator was required to make adjustments either by changing flow rates, or altering energy inputs to keep the process within its designed limits.
ALARMS were added to alert the operator to a condition that was about to exceed a design limit, or had already exceeded a design limit. Additionally, Emergency Shut Down (ESD) systems were employed to halt a process that was in danger of exceeding either safety, environmental or monetarily acceptable process limits. ALARMS were indicated to the operator by annunciator horns, and lights of different colors. (For instance, green lights meant OK, Yellow meant not OK, and Red meant BAD.) Panel boards were usually laid out in a manner that replicated the process flow in the plant. So instrumentation indicating operating units with the plant was grouped together for recognition sake and ease of problem solution. It was a simple matter to look at the entire panel board, and discern whether any section of the plant was running poorly. This was due to both the design of the instruments and the implementation of the alarms associated with the instruments. Instrumentation companies put a lot of effort into the design and individual layout of the instruments they manufactured. To do this they employed behavioral psychology practices which revealed how much information a human being could collect in a quick glance. More complex plants had more complex panel boards, and therefore often more human operators or controllers.
Thus, in the early days of panel board systems, alarms were regulated by both real estate, and cost. In essence, they were limited by the amount of available board space, and the cost of running wiring, and hooking up an annunciator (horn), indicator (light) and switches to flip to acknowledge, and clear a resolved alarm. It was often the case that if you wanted a new alarm, you had to decide which old one to give up.
As technology developed, the control system and control methods were tasked to continue to advance a higher degree of plant automation with each passing year. Highly complex material processing called for highly complex control methodologies. Also, global competition pushed manufacturing operations to increase production while using less energy, and producing less waste. In the days of the panel boards, a special kind of engineer was required to understand a combination of the electronic equipment associated with process measurement and control, the control algorithms necessary to control the process (PID basics), and the actual process that was being utilized to make the products. Around the mid 80's, we entered the digital revolution. Digital control systems (DCS- originally called Distributed Control Systems before they became digital) were a boon to the industry. The engineer could now control the process without having to understand the equipment necessary to perform the control functions. Panel boards were no longer required, because all of the information that once came across analog instruments could be digitized, stuffed into a computer and manipulated to achieve the same control actions once performed with amplifiers and potentiometers.
As a side effect, that also meant that alarms were easy and cheap to configure and deploy. You simply typed in a location, a value to alarm on and set it to active. The unintended result was that soon people alarmed everything. INitial installers set an alarm at 80% and 20% of the operating range of any variable just as a habit. One other unfortunate part of the digital revolution was that what once covered several square yards of real estate, now had to be fit into a 17 inch computer monitor. Multiple pages of information was thus employed to replicate the information on the replaced panel board. Alarms were utilized to tell an operator to go look at a page he was not viewing. Alarms were used to tell an operator that a tank was filling. Every mistake made in operations usually resulted in a new alarm. With the implementation of the OSHA 1910 regulations, HAZOPS studies usually requested several new alarms. Alarms were everywhere. Incidents began to accrue as a combination of too much data collided with too little useful information.
Alarm Management History
Recognizing that alarms were becoming a problem, users banded together and formed the Alarm Management Workgroup. It was an alliance of operating companies from Chemical and Petrochemical and refining operations. They gathered and wrote a document on the issues associated with alarm management. This group quickly realized that alarm problems were simply a subset of a larger problem, and formed the ASM consortium (ASM is a registered trademeark of Honeywell, and stands for Abnormal Situation Management). See the website at The ASM consortium was originally a charter of NIST, (National institute of standards and Technology- ) and the group of users. Essentially, they realized that alarms exist because of a problem referred to as Situation Awareness. See more information on this issue at .
The Alarm Management Consortium produced documents on best practices in alarm management. It further produced documentation on other best practices in operator situation awareness, operator effectiveness, and other operator-oriented issues. Some ofthe se documents are available at their website.
The ASM consortium funded an alarm management guidelines published by the EEMUA in the UK. Providing data from their member companies, and contributing to the editing of the guidelines, they produced the EEMUA 191 "Alarm Systems- A Guide to Design, Management and Procurement". It can be ordered at .
Several institutions and societies are producing standards on alarm management to assist their members in the best practices use of alarms in industrial manufacturing systems. AMong them are the ISA (ISA SP-18), API (API 1167) and Namur (Namur 102). Several companies also offer software packages to assist users in dealing with alarm management issues. Among them are DCS manufacturing companies, and third-party vendors who offer add-on systems.
Alarm Management Concepts
The fundamental purpose of alarm annunciation is to alert the operator to deviations from normal operating conditions, i.e. abnormal operating situations. The ultimate objective is to prevent, or at least minimize, physical and economic loss through operator intervention in response to the condition that was alarmed. For most digital control system users, losses can result from situations that threaten environmental safety, personnel safety, equipment integrity, economy of operation, and product quality control as well as plant throughput. A key factor in operator response effectiveness is the speed and accuracy with which the operator can identify the alarms that require immediate action.
By default, the assignment of alarm trip points and alarm priorities constitute basic alarm management. Each individual alarm is designed to provide an alert when that process indication deviates from normal. The main problem with basic alarm management is that these features are static. The resultant alarm annunciation does not respond to changes in the mode of operation or the operating conditions.
When a major piece of process equipment like a charge pump, compressor, or fired heater shuts down, many alarms become unnecessary. These alarms are no longer independent exceptions from normal operation. They indicate, in that situation, secondary, non-critical effects and no longer provide the operator with important information. Similarly, during startup or shutdown of a process unit, many alarms are not meaningful. This is often the case because the static alarm conditions conflict with the required operating criteria for startup and shutdown.
In all cases of major equipment failure, startups, and shutdowns, the operator must search alarm annunciation displays and analyze which alarms are significant. This wastes valuable time when the operator needs to make important operating decisions and take swift action. If the resultant flood of alarms becomes too great for the operator to comprehend, then the basic alarm management system has failed as a system that allows the operator to respond quickly and accurately to the alarms that require immediate action. In such cases, the operator has virtually no chance to minimize, let alone prevent, a significant loss.
In short, one needs to extend the objectives of alarm management beyond the basic level. It is not sufficient to utilize multiple priority levels because priority itself is often dynamic. Likewise, alarm disabling based on unit association or suppressing audible annunciation based on priority do not provide dynamic, selective alarm annunciation. The solution must be an alarm management system that can dynamically filter the process alarms based on the current plant operation and conditions so that only the currently significant alarms are annunciated.
The fundamental purpose of dynamic alarm annunciation is to alert the operator to relevant abnormal operating situations. They include situations that have a necessary or possible operator response to insure:
*Personnel and Environmental Safety,
*Product Quality Control.The ultimate objectives are no different than the previous basic alarm annunciation management objectives. Dynamic alarm annunciation management focuses the operator’s attention by eliminating extraneous alarms, providing better recognition of critical problems, and insuring swifter, more accurate operator response. [Jensen, Leslie D. [http://www.prosys.com/UG95_DCS.pdf "Dynamic Alarm Management on an Ethylene Plant"] . Retrieved 2008-05-22.]
The Need for Alarm Management
Alarm management is usually necessary in a
process manufacturingenvironment that is controlled by an operator using a control system, such as a Distributed Control System, or DCSor a PLC, or Programmable Logic Controller. Such a system may have hundreds of individual alarms that up until very recently have probably been designed with only limited consideration of other alarms in the system. Since humans can only do one thing at a time and can pay attentionto a limited number of things at a time, there needs to be a way to ensure that alarms are presented at a rate that can be assimilated by a human operator, particularly when the plant is upset or in an unusual condition. Alarms also need to be capable of directing the operator's attention to the most important problem that he or she needs to act upon, using a priority to indicate degree of importance or rank, for instance. A good example of this problem is from the old US sitcom"MASH". A common scene was Radar O'Reillyslipping in a requisition for something that Hawkeye wanted in the stack of papers for Colonel Potter to sign. In much the same way, if alarms were unprioritized, the important ones can be mixed in with lower value nuisance ones.
ome Improvement Methods
The techniques for achieving rate reduction range from the extremely simple ones of reducing nuisance and low value alarms to redesigning the alarm system in a
holisticway that considers the relationships among individual alarms.
The first step in a
continuous improvementprogram is often to measure alarm rate, and resolve any chronic problems such as alarms that have no use (often described as one that does not require the operator to take an action).
This step involves documenting the methodology or
philosophyof how to design alarms. It can include things such as what to alarm, standards for alarm annunciation and text messages, how the operator will interact with the alarms, etc.
Documentation and Rationalization
This phase is a detailed review of all alarms to document their design purpose, and to ensure that they are selected and set properly and meet the design criteria. Ideally this stage will result in a reduction of alarms, but doesn't always.
The above steps will often still fail to prevent an alarm flood in an operational upset, so advanced methods such as alarm suppression under certain circumstances are then necessary. As an example, shutting down a
pumpwill always cause a low flow alarm on the pump outlet flow, so the low flow alarm may be suppressed if the pump was shut down since it adds no value for the operator, because he or she already knows it was caused by the pump being shutdown. This technique can of course get very complicated and requires considerable care in design. In the above case for instance, it can be argued that the low flow alarm does add value as it confirms to the operator that the pump has indeed stopped.
Alarm management becomes more and more necessary as the
complexityand size of manufacturing systems increases. A lot of the need for alarm management also arises because alarms can be configured on a DCS at nearly zero incremental cost, whereas in the past on physical control panel systems that consisted of individual pneumaticor electronic analog instruments, each alarm required expenditure and control panel real estate, so more thought usually went into the need for an alarm. Numerous disasters such as Three Mile Islandand the Chernobyl accidenthave established a clear need for alarm management.
The Seven Steps to Alarm Management
Step 1: Create and Adopt an Alarm Philosophy
A comprehensive design and guideline document that makes it clear “exactly how to do alarms right.”
Step 2: Alarm Performance Benchmarking
Analyze the alarm system to determine its strengths and deficiencies, and effectively map out a practical solution to improve it.
Step 3: “Bad Actor” Alarm Resolution
From experience, it is known that around half of the entire alarm load usually comes from a relatively few alarms. The methods for making them work properly are documented, and can be applied with minimum effort and maximum performance improvement.
Step 4: Alarm Documentation and Rationalization (D&R)
A full overhaul of the alarm system to ensure that each alarm complies with the alarm philosophy and the principles of good alarm management.
Step 5: Alarm System Audit and Enforcement
DCS alarm systems are notoriously easy to change and generally lack proper security. Methods are needed to insure that the alarm system does not drift from its rationalized state.
Step 6: Real-Time Alarm Management
More advanced alarm management techniques are often needed to ensure that the alarm system properly supports, rather than hinders, the operator in all operating scenarios. These include Alarm Shelving, State-Based Alarming, and Alarm Flood Suppression technologies.
Step 7: Control and Maintain Alarm System Performance
Proper management of change and longer term analysis and KPI monitoring are needed, to ensure that the gains that have been achieved from performing the steps above do not dwindle away over time. Otherwise they will; the principle of “entropy” definitely applies to an alarm system.
List of human-computer interaction topics, since most control systems are computer-based
Design, especially interaction design
* [http://www.tipsweb.com/about/events/conference2008/papers.asp "Papers from TiPS User Conference 2008"]
* [http://processingpas.com/amhandbook.html "The Alarm Management Handbook"]
* [http://www.hse.gov.uk/pubns/chis6.pdf "Better Alarm handling", from the British Government's Health and Safety Executive (HSE)]
EEMUA191 Alarm Systems - A Guide to Design, Management and Procurement (1999) ISBN 0-85931-076-0
* [http://www.ptil.no/regelverk/R2002/ALARM_SYSTEM_DESIGN_E.HTM "Principles for alarm system design" Norwegian Petroleum Directorate]
* [http://www.alarmmanagement.EU Alarmmanagement Willem Hazenberg BSc Thesis about the businesscase]
* [http://www.matrikon.com/portal/apd/summary.aspx "Build your own Alarm Philosophy Document"]
* [http://secure.stohn.biz/security/ Home Security Advice and Tips]
Wikimedia Foundation. 2010.