ISO 31000 Risk management techniques - continued
There are twelve posts in this series. To read Part III, please click here.
Just to recap:
ISO 9001 Risk-based thinking could (and I am not saying that it should) be demonstrated by one or more of the risk assessment tools in ISO 31010.
Note: the text is based on the contents of Table A.2 – Attributes of a selection of risk assessment tools [Source: IEC/FDIS 31010:2009].
Continuing with ...
Root cause analysis (RCA)
Root Cause Analysis (RCA) uses a specific set of steps, with associated tools, to help find the primary cause of the problem; so that you can:
- Determine what happened.
- Determine why it happened
Figure out what to do to reduce the likelihood that it will happen again. RCA assumes that systems and events are interrelated. An action in one area triggers an action in another, and another, and so on. By tracing back these actions, you can discover where the problem started and how it grew into the symptom you're now facing.1
Scenario analysis is a process of analyzing possible future events by considering alternative outcomes (sometimes called "alternative worlds").2
The technique can be used to identify risks by considering sets of scenarios that reflect (for example) ‘best case’, ‘worst case’ and ‘expected case’, in order to analyse potential consequences and their probabilities for each scenario as a form of sensitivity analysis when analysing risk.
'The possible future scenarios or 'alternative worlds' are identified...
"...through imagination or extrapolation from the present and different risks considered assuming [that] each of these scenarios might occur. This can be done formally or informally, qualitatively or quantitatively."3
Toxicological / Environmental / Ecological risk assessment
An ecological risk assessment tells what happens to a bird, fish, plant or other non-human organism when it is exposed to a stressor, such as a pesticide.4
Aspects of the methodology, such as pathway analysis which explore different routes by which a target might be exposed to a source of risk, can be adapted and used across a very wide range of different risk areas, outside human health and the environment, and is useful in identifying treatments to reduce risk.5
The strength of this analysis is that it provides a very detailed understanding of the nature of the problem and the factors which increase risk. However, it needs good data which is often not available or has a high level of uncertainty associated with it. Likewise, it is also resource intensive as is unlikely to find many uses in quality management systems.
Pathway analysis, though, is a useful tool, generally, for all areas of risk and permits the identification of how and where it may be possible to improve controls or introduce new ones.
If you are interested in following the steps of this type of environmental risk assessment process, I recommend that you read 'Basic Information about Risk Assessment Guidelines Development', published by the United States Environmental Protection Agency. See the web page link below:
Business impact analysis (BIA)
A Business Impact Analysis identifies an organization's exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, while maintaining competitive advantage and value system integrity.6
The analysis provided by a conscientiously-conducted BIA could be of value when determining "...the external and internal issues that are relevant to the organization's purpose ... and that affect its ability to achieve the intended result(s) of its quality management system"; as well as helping to determine who are "the interested parties", and the requirements of these interested parties that are relevant to the quality management system - see ISO 9001:2015 Clause 4 Context of the organization. If your organization already has a business continuity management system (BCM) based on the ISO 22301 Standard and since a BIA is a mandatory document, seeking out your Business Continuity Manager to obtain the BIA report could be a sound move at this point. You will then have a valuable item of documented information to show risk-based thinking because you will have assessed (by means of the BIA) how key disruption risks could affect an organization’s operations and identified/quantified the capabilities that would be required to manage it.
If not, well ... you could consider conducting a BIA; although I would strongly recommend calling in a qualified business continuity consultant.
Fault tree analysis
A technique used in safety engineering and reliability engineering, mostly in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries. Fault tree analysis (FTA) can be used to understand how systems can fail, to identify the best ways to reduce risk or to determine or 'get a feel for' event rates of a safety accident or a particular system level (functional) failure. It sounds more complicated than it actually is; however, it is a resource hungry method.
If you are a Quality Manager in one of the above industries you will probably already be familiar with fault tree diagrams produced from this type of analysis and you may well use the fault trees developed by the organization to reduce or eliminate potential causes of non-conformities. They start with the undesired event (top event) and determine all the ways in which it could occur, shown graphically in a logical tree diagram.
Fault tree analysis is a time-consuming and costly exercise although it can be invaluable in determining the probability of (undesirable) outcomes.
FTA can be used to:
- understand the logic leading to the top event / undesired state.
- show compliance with the (input) system safety / reliability requirements.
- prioritize the contributors leading to the top event - Creating the Critical Equipment/Parts/Events lists for different importance measures.
- monitor and control the safety performance of the complex system (e.g., is a particular aircraft safe to fly when fuel valve x malfunctions? For how long is it allowed to fly with the valve malfunction?).
- minimize and optimize resources.
- assist in designing a system. The FTA can be used as a design tool that helps to create (output / lower level) requirements.
- function as a diagnostic tool to identify and correct causes of the top event. It can help with the creation of diagnostic manuals / processes.7
Event tree analysis
A forward, bottom up, logical modeling technique for both success and failure that explores responses through a single initiating event and lays a path for assessing probabilities of the outcomes and overall system analysis. Using inductive reasoning, ETA translates probabilities of different initiating events into possible outcomes. It is arguably less resource intensive than fault tree analysis (see Table A.2 in ISO 31010).
ETA can be applied to a wide range of systems including: nuclear power plants, spacecraft, and chemical plants.8
Once again, if you are managing the quality system of a small enterprise in a relatively 'low risk' context, this technique is unlikely to be for you.
Cause and consequence analysis
ISO 31010 describes the Cause and consequence analysis method as:
"A combination of fault and event tree analysis that allows inclusion of time delays. Both causes and consequences of an initiating event are considered."
It starts from a critical event and analyses consequences by means of a combination of YES/NO logic gates which represent conditions that may occur or failures of systems designed to mitigate the consequences of the initiating event. The causes of the conditions or failures are analysed by means of fault trees (see ISO 31010, Clause B.15).
Cause-consequence analysis does provide a comprehensive view of the entire system. However, it is more complex than fault tree and event tree analysis, both to construct and in the manner in which dependencies are dealt with during quantification, and so requires more time and resources.
Cause-and effect analysis
An effect can have a number of contributory factors which can be grouped in Ishikawa diagrams. Contributory factors are identified often through a brainstorming process (see Part II of this article for more information).
Ishikawa diagrams were popularized by Kaoru Ishikawa in the 1960s, who pioneered quality management processes in the Kawasaki shipyards. The basic concept was first used in the 1920s, and is considered one of the seven basic tools of quality control. Ishikawa diagrams are known as fishbone diagrams because their shape is like the side view of a fish skeleton.
The basic steps in performing a cause-and-effect analysis are as follows9:
- establish the effect to be analysed and place it in a box. The effect may be positive (an objective) or negative (a problem) depending on the circumstances;
- determine the main categories of causes represented by boxes in the Fishbone diagram. Typically, for a system problem, the categories might be people, equipment, environment, processes, etc. However, these are chosen to fit the particular context;
- fill in the possible causes for each major category with branches and sub-branches to describe the relationship between them;
- keep asking “why?” or “what caused that?” to connect the causes;
- review all branches to verify consistency and completeness and ensure that the causes apply to the main effect;
- identify the most likely causes based on the opinion of the team and available evidence.
The results are displayed as either an Ishikawa diagram or tree diagram.
FMEA (Failure modes and effects analysis) and FMECA (Failure modes and effects and criticality analysis)
FMEA/FMECA is an inductive reasoning (forward logic) single point of failure analysis and is a core task in reliability engineering, safety engineering and quality engineering. Quality engineering is specially concerned with the "Process" (Manufacturing and Assembly) type of FMEA.10
- all potential failure modes of the various parts of a system (a failure mode is what is observed to fail or to perform incorrectly);
- the effects these failures may have on the system;
- the mechanisms of failure;
- how to avoid the failures, and/or mitigate the effects of the failures on the system.
FMEA/FMECA is a systematic analysis technique that can be used to identify the ways in which components, systems or processes can fail to fulfil their design intent, highlighting:
- design alternatives with high dependability;
- failure modes of systems and processes, and their effects on operational success have been considered;
- human error modes and effects;
- a basis for planning testing and maintenance of physical systems;
- improvements in the design of procedures and processes.
FMEA/FMECA also provides qualitative or quantitative information for other types of analysis, such as fault tree analysis, and is used in quality assurance applications. For example, it can produce a semi-quantitative measure of criticality known as the risk priority number (RPN) obtained by multiplying numbers from rating scales (usually between 1 and 10) for (a) consequence of failure, (b) likelihood of failure, (c) ability to detect the problem. Note, a failure is given a higher priority if it is difficult to detect.
Reliability-centred maintenance (RCM)
A technique that is used to achieve the required safety, availability and economy of operation (safe minimum levels of maintenance), so that assets continue to do what their users require in their operating context.
RCM allows you to identify applicable and effective preventive maintenance requirements for equipment "...in accordance with the safety, operational and economic consequences of identifiable failures, and the degradation mechanism responsible for those failures".11
RCM uses a failure mode, effect and criticality analysis (FMECA) type of risk assessment that requires a specific approach to analysis in this context. From a quality management standpoint, it's worth being aware that RCM identifies required functions and performance standards and failures of equipment and components that can interrupt those functions.
For more information, see IEC 60300-3-11, Dependability management – Part 3-11: Application guide – Reliability
Sneak analysis (SA) and sneak circuit analysis (SCI)
Sneak analysis is aimed at uncovering design flaws that allow for 'sneak conditions', i.e. those that may cause unwanted actions or may inhibit a desired function, and are not caused by component failure to develop.
Sneak analysis can locate problems in both hardware and software using any technology. The sneak analysis tools can integrate several analyses such as fault trees, failure mode and effects analysis (FMEA), reliability estimates, etc. into a single analysis saving time and project expenses.12 The technique helps in identifying design errors and works best when applied in conjunction with HAZOP. It is very good for dealing with systems which have multiple states such as batch and semi-batch plant.
Sneak Circuit Analysis (SCA) is used in safety-critical systems to identify sneak (or hidden) paths in electronic and electro-mechanical systems that may cause unwanted action or inhibit desired functions. The analysis is based on identification of designed-in inadvertent modes of operation and is not based on failed equipment or software. SCA is most applicable to circuits that can cause irreversible events. These include:
a. Systems that control or perform active tasks or functions
b. Systems that control electrical power and its distribution.
c. Embedded code which controls and times system functions.13
The SA process differs depending on whether it is applied to electrical circuits, process plants, mechanical equipment or software technology, and the method used is dependent on establishing correct network trees.
HACCP a systematic preventive approach to food safety from biological, chemical, and physical hazards in production processes that can cause the finished product to be unsafe, and designs measurements to reduce these risks to a safe level.14 HACCP has been recognized internationally as a logical tool for adapting traditional inspection methods to a modern, science-based, food safety system.15
HACCP is focused only on the health safety issues of a product ensuring that risks are minimized by controls throughout the process rather than through inspection of the end product. The seven HACCP principles are the basis of most food quality and safety assurance systems, and the United States, HACCP compliance is regulated by 21 CFR part 120 and 123. The HACCP principles are also included in the international standard ISO 22000 FSMS 2005. This standard is a complete food safety and quality management system incorporating the elements of prerequisite programmes (GMP & SSOP), HACCP and the quality management system, which together form an organization's Total Quality Management system.
Table A.1 – Applicability of tools used for risk assessment [see page 22 of ISO 31010], lists the HACCP technique as "Not Applicable" for analysis of probability or levels of risk.16 However, the principle of identifying the factors [risks] that can influence product quality, and defining process points where critical parameters can be monitored and hazards controlled, can be generalized for use other technical systems.17
LOPA (Layers of Protection Analysis)
A technique for analysing whether there are sufficient measures to control or mitigate the risk of an undesired outcome.
The basic steps are:
A cause-consequence pair is selected and the layers of protection which prevent the cause leading to the undesired consequence are identified.
An order of magnitude calculation is then carried out to determine whether the protection is adequate to reduce risk to a tolerable level.18
LOPA is a less resource-intensive process than a fault tree analysis or a fully quantitative form of risk assessment, but is more rigorous than qualitative subjective judgements alone. It focuses efforts on the most critical layers of protection, identifying operations, systems and processes for which there are insufficient safeguards and where failure will have serious consequences. However, this technique looks at one cause-consequence pair and one scenario at a time and, therefore, does not apply to complex scenarios where there are many cause consequence pairs or where a variety of consequences affect different stakeholders.
For more information, see:
Bow-tie analysis is a simple diagrammatic way to display the pathways of a risk showing a range of possible causes and consequences. It is used in situations when a complex fault tree analysis is not justified or to ensure that there is a barrier or control for each of the possible failure pathways.
To understand how this works I recommend viewing a short video entitled "The Bow Tie Method in 5 Minutes" by CGE Risk Management Solutions19, which explains the basics of the method for risk assessment of hazards.
ISO 31010 lists the following statistical methods for risk assessment:
- Markov analysis
- Monte-Carlo analysis
- Bayesian analysis
I will examine how these might be applied in the context of quality management systems and associated processes in a separate post.
There are twelve posts in this series. To read Part V, please click here.
1 Root Cause Analysis, Tracing a Problem to its Root Origins, Mind Tools website: http://www.mindtools.com/pages/article/newTMC_80.htm
2 Scenario Analysis, Wikipedia: http://en.wikipedia.org/wiki/Scenario_analysis.
3 ISO/IEC 31010:2009, Table A.2 - Attributes of a selection of risk assessment tools.
4 Ecological Risk Assessment: Technical Overview, Ecological Risk Assessment Process, U.S. Environmental Protection Agency website: http://www.epa.gov/oppefed1/ecorisk_ders/index.htm#WITERAP
5 ISO/IEC 31010:2009, B.8.2 Use, p.37.
6 Elliot, D.; Swartz, E.; Herbane, B. (1999) Just waiting for the next big bang: business continuity planning in the UK finance sector. Journal of Applied Management Studies, Vol. 8, No, pp. 43–60. Here: p. 48
7 Fault tree analysis, Wikipedia: http://en.wikipedia.org/wiki/Fault_tree_analysis
8 Event Tree Analysis, Wikipedia: http://en.wikipedia.org/wiki/Event_tree_analysis.
9 ISO/IEC 31010:2009, B.17.4 Process, p.57.
10 Failure mode and effects analysis, Wikipedia: http://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis
11 ISO/IEC 31010:2009, B.22.1 Overview, p.66
12 Ibid., B.23.2 Use, p.68.
13 Sneak circuit analysis, Wikipedia: http://en.wikipedia.org/wiki/Sneak_circuit_analysis
14 Hazard analysis and critical control points, Wikipedia: http://en.wikipedia.org/wiki/Hazard_analysis_and_critical_control_points
16 ISO/IEC 31010:2009, Table A.1 – Applicability of tools used for risk assessment, p.22
17 Ibid., B.7.2 Use, p.35.
18 Ibid., B.18 Layers of protection analysis (LOPA), p.59.
19 The Bow Tie Method in 5 Minutes, CGE Risk Management Solutions, YouTube: https://www.youtube.com/watch?v=P7Z6L7fjsi0
This post was written by Michael Shuff