ITIL’s guidance on Risk Management has changed through the years. In ITIL v2, it was based on C.C.T.A.’s Risk Analysis Management Methodology (the acronym is CRAMM, I kid you not). It looked at the combination of Assets, Vulnerabilities, and Threats to derive a Risk score to which Countermeasures were applied.
ITIL v3 2011 defers Risk Management to:
- Management of Risk (which is abbreviated as M_o_R and is part of the Global Best Practice portfolio of AXELOS Ltd.)
- ISO 31000
- ISO/IEC 27001
- Risk IT (part of the IT Governance portfolio of ISACA)
I actually have a fond place in my heart for the Microsoft Operations Framework (MOF) Risk Management model, which is kind of strange, because I don’t have a fond space in my heart for a lot of MOF’s other concepts.
Every single one of these frameworks has pretty graphics with boxes and arrows trying to show you graphically how to manage Risk. None of the pretty pictures are worth a damn thing in the real world of Risk Management. Ok, maybe they do have value, I just find it difficult to translate the pretty pictures into real-world actions.
When I taught ITIL Foundation courses, I would teach the attendees what they needed to pass any questions on the certification exam (which in ITIL v2 is pretty much the abbreviation CRAMM and for ITIL v3 it is nothing) and then show them my way of managing Risk in an IT environment (it’s a mashup of MOF and CRAMM).
First you need a list of:
- Assets – Identify which Assets could seriously impact the organization if they were impaired, but be focused and keep the list as short as possible. It is way too easy for groups to get lost down the rabbit hole and start to worry about the Risk of a keyboard or headset failure.
So now you have a list of Assets, either physical or virtual, which could significantly impact the organization if impaired. For each Asset you need to identify:
- Vulnerabilities – Determine the Vulnerabilities could disrupt the business. Obviously the primary file server going down is a huge Vulnerability, but what if it runs out of disk space? What if it slows down by 50%? What if all the files suddenly show up as locked? Again, we don’t want to go too far down this path. I would recommend picking the biggest three Vulnerability of each Asset and focus on those.
- Threats – Imagine what Threats could cause the manifestation of the Vulnerability. The server catching on fire is a Threat that will cause the server to go down. So could setting off the water based fire extinguisher system. Don’t forget to think about acts of god like earthquake, tornadoes, sharknadoes, etc.
- Probabilities – Estimate the Probability that any individual Threat will manifest. I always make the mistake of assuming the Probability of a sharknado is much higher than it really is.
- Impacts – Know the Impact of any Threat if it materialized. Will the organization lose money? Will peoples’ lives be at stake? Could it possibly prevent a famous actor from getting an Academy Award?
So with the Asset’s Vulnerabilities, Threats, Probabilities, and Impacts defined, you can determine the Risk value. For some of these Threats, you may already have (or plan to implement) some form of Countermeasure to reduce the impact of a Threat if and when it materializes.
So now that I’ve confused you with WAY too many words, let’s look at it in the real world.
Let’s discuss the Risk associated with two different companies:
- A restaurant company with no technical employees
- An on-line retail organization with no brick-and-mortar stores
Both companies have two similar Assets:
- A website server hosted by the same hosting company
- Credit card processing point of sale terminal (restaurant) and credit card processing for ecommerce transactions (on-line retailer)
Loss of these Assets affect the companies quite differently as we will see in the Impact statements.
|Promotional website||Web server goes down or is otherwise unavailable||DDOS of host provider||50% Chance of 5 hours outage/yr||Little to no loss in revenue, inability for potential customers to see on-line menu||5-Very Low||Issue occurred on 5/1 for 2h, no customers reported the outage|
|Point of sale credit card processing terminal||Credit Card transaction service stops working||Failure of service provider’s technology||70% Chance of 2 hours outage/yr||Direct loss of revenue (food given away to customer who do not have cash), loss of customer confidence||2-High||Issue occurred on 6/1 for 30 minutes, direct cost of $1000 lost revenue.|
|Retail on-line purchasing website||Web server goes down or is otherwise unavailable||DDOS of host provider||50% Chance of 5 hours outage/yr||Direct loss of revenue (no site = no sales), loss of customer confidence||1-Critical||Issue occurred on 5/1 for 2h, estimated direct cost of $10,000 lost revenue and 10 man/hours IT time @ loaded cost of $150/hr|
|On-line credit card processing service||Credit Card transaction service stops working||Failure of service provider’s technology||70% Chance of 2 hours outage/yr||Some loss of revenue||3-Medium||Pre-configured banner asking customers to call sales line. Have sales line capture cc info and process charge later.||Issue occurred on 6/1 for 30 minutes, estimated direct cost of $400 lost revenue and 1 man/hour of IT time @ loaded cost of $150/hr. Countermeasure worked with $400 of sales redirected to sales line.|
Notice that the Vulnerability, Threat, and Probability are identical for each of the companys’ Assets, but look at the Impact to each company if the Threat materializes.
The Restaurant can withstand the loss of their website fairly well. In fact it is possible that no customers will even notice the site’s outage. That same outage affecting the on-line retailer is disastrous.
The reverse is true of the credit card processing service. To the restaurant it means that they have to give away free food. The on-line retailer has a Countermeasure in place that, although not perfect, allows them to capture customer orders and process their cards later.
Hopefully you get the idea. List out your Assets, the Asset’s Vulnerabilities, the Threat that can expose the Vulnerability, the Probability of the Threat materializing. This will allow you to come up with a Risk value. Some companies go to great lengths to calculate the Risk value. They will put in numeric weights for each of the preceding cells and multiply them together and then have a scale that indicates whether it is low or high Risk. I prefer to use the more “gut feel” method of just looking at it and using your professional judgment to determine the Risk.
And of course you should always make copious notes if the Threat does materialize. How long was the service interrupted? How much did it cost? Etc. This may cause you to update the Probability or Risk in the table.
Once you get a fully populated table, sort it by the Risk column and start coming up with solutions that reduce that Risk prior to the Risk materializing (Mitigations). Or maybe come up with some Countermeasures to reduce the affect of the Threat once it materializes.
By the way, if ever you eliminate the Risk from an Asset/Vulnerability/Threat line, don’t erase that row, archive it. Who knows? Maybe someday in the future you can look at the lessons learned from an older Asset/Vulnerability/Threat and apply it to a new Asset.
I call this Risk Management methodology Assets, Vulnerabilities, Threats, Probabilities, Impacts, Risks, and Countermeasures – or for simplicity’s sake – AVTPIRC.
Hey, it’s better than CRAMM.