Infocon Magazine Issue One, October 2003
Business Continuity Planning Interview with David Spinks, EDS
Interviewer: Wanja Eric Naef
Q: Usually there is confusion about the term BCP. Some people use the term continuity, others contingency and some use them interchangeably. How would you define them?
David Spinks: That is a good question. For BCP, business continuity planning, certainly I would like to use the widest use of BCP in that if you look at a typical business continuity planning study we are not just necessarily looking at the needs, wants and requirements of the business. The typical BCP study if focused on the right area will start looking at stakeholders and their values. For instance, I have done some work in the oil industry. The thing with the oil industry clearly is that they have a potential impact on the environment they are in. Very often with many oil companies I have worked for it was actually a beneficial impact, because they subsidise local schools and local education, because it is in their interest to be centred in a happy community.
So therefore the typical BCP study if conducted in the right way will consider the needs and wants of the stakeholder, communities, shareholders, owners, staff, management or all stakeholders including bank insurance companies in the corporate entity. Next the study will normally include some form of business impact assessment. Now if the study has been done at the right level then those business impacts will also consider the impacts of any unplanned event on the stakeholder as well as the business itself. So we are not just looking at contingency planning: generally contingency planning entails factors concerning a corporate entity, organisation and business process and attempts to predict what may happen if unplanned events occur. The recommendation will typically suggest various mitigations against those particular business impacts. But BCP is at much higher level. It is saying that we should consider this in terms of the impacts of these things on the corporate entity and corporate stakeholders as well.
Obviously the other big confusion is with either CDR, which is computer disaster recovery, or DR, which is disaster recovery. Quite often those studies are centred on the technology rather than the business. Clearly technology has a major role to play in the business, but quite often the disaster recovery from the computing we are finding more and more is done in isolation from the business continuity planning. What we are trying to do is to bring the two things together and thus make sure that the computer disaster recovery complements the business continuity planning.
Spinks: I would like to step back from that question and say the plan is important, i.e. the material bits of paper or responding to a crisis are important. However, what is even more important than the plan itself is the planning process, which has led to the development of the plan. And what I would look for in the terms of the effectiveness of the plan is:
First, have all the business leaders been involved? Is the plan owned by the business or does the business say we delegate that to the IT department or the business-planning department or is the plan owned by the business continuity manager? If the ownership of that plan is in a technical area then it is very likely that when unexpected events happen then that plan is not going to work. So therefore the plan should involve the operational business leaders, so that they see it as their plan. So that is the first critical success factor in the plan.
The second is probably the most important and that is: is there evidence that the plan has been tested? We would look for two types of tests: we would look for desk based tests, simulations, but we would also look for real evidence that plans have been tested for real, namely that people have been taken through the motions of various crises and that somebody has independently stress-tested those tests themselves. We see far too many business continuity plans and or disaster recovery plans that whilst the have been tested were done so in unrealistically ideal conditions and thus do not truly recognise what really happens in a crises.
There are two things which happen in a crisis: firstly communications break down unless there is good crises communication plan and secondly management, this is from real case history from big disasters, are nowhere to be found. They disappear. Alternatively management become obtrusive because a crisis has two phases: the internal phase which is in-looking into the organisation and attempting to recover that organisation using the business continuity plan. There is another phase of that which is we go back to the organisation looking out in the stakeholder world and managing the stakeholders’ needs and wants at the point of the crisis. And that is where the communication and PR people come in because they are the people who are going to front the press, the media, the stakeholders and the shareholders.
A very effective BCP plan operates on three levels globally. This is where the board operates, as it looks at the long term impacts of the crisis on the business and the board are then communicating with the press, the media and if necessary with governments at a global level.
There is second level, which is looking at the recovery of either a site or geographical area.
And the third level is looking at a) putting out the fire, because that is quite important. It is like a bath with the taps running. Put the plug in and make sure the water doesn’t leak out and then turn the taps off. That is the operational response. What we find is that too many technical people only consider the third level. They only consider the technical response to it and they forget that in a major crisis somebody has to be at the gates talking to the press, talking to the media, reassuring the local community, dealing with the longer-term aspects of an event.
Now that model builds on the UK cabinet office and home office emergency response gold, sliver, bronze scheme  and we have seen that work in operation a number of times and it worked very successfully.
Spinks: The King’s Cross Fire in the UK was managed on a bronze, silver and gold operational level. And there are some pretty good case histories written about that and how to deal with it.
Another example was the operation put into place by Greater Manchester Police after the Manchester Bomb. And that had all sorts of lessons to be learned on how to manage a crisis.
Spinks: There are enough codes of practice in place for senior management to not only support business continuity planning, but actually pushing people to deliver it, because at the end of the day the board of directors is held responsible in times of crisis if there is not adequate planning. In the UK, we have generally recognized practices such as the Turnbull Combined Code of Practice which makes various commitments on both executive and non executive officers of organisations to manage not just financial risk but also operational, environmental and safety risks. And one of the key responses to risk management is business continuity planning. If they are not supporting BCP plans the needle from innocent to guilty will move rapidly from innocent to guilty. And I think there are enough cases in place now, enough proof in place, to suggest to any peer groups of executive officers that they need to take business continuity planning seriously. So that is a bit like a stick. The carrot--and it is important to have a carrot as well--is that organisations that have taken business continuity planning seriously will reap benefits. For example, the big car manufacturers are excellent in extracting performance benefits out of the supplier chain and I would suggest that whilst there might well be an initial cost for large car manufacturers in implementing BCP, if they then put that down their supplier chain they will see significant improvements from their suppliers. And they also gain a higher level of confidence in key suppliers where they just in time contracts that come what may whatever happens to that supplier they will be able to continue the supply of spare parts through any crisis the supplier has.
Spinks: No. BS7799 is an information security standard and the latter sections do mention BCP and they emphasis that is has to be risk based and that testing is required, but they do not go into any detail. There are other codes of practise available for BCP, which go beyond 7799. It is excellent that it is in there, but 7799 is really primarily focused on information technology. So the natural progression for people looking at that is to take section 11 into a computer disaster recovery plan rather than a true BCP.
Spinks: I think it depends on the particular organisation - which sector they are in, and depending on the needs of the regulators. For instance, if you take financial services you would look for some quantification of risk for financial services. They will have to do that before 2005 anyway to meet the Basel II requirements. So unless they are prepared to put huge sums of money into the capital pot, they have to begin to quantify risk. On the other end of the scale, if you look at a purely commercial organisation where it is administrative or there is a low threat of loss then maybe just a qualitative review of risk.
But the important thing is that BCP is not just based on risk assessment. Risk assessment is important, but what is even more important is risk management. It has to be part of a risk management process. It is the easiest thing in the world to assess risk, because in information security you are looking for threats, vulnerabilities, protective measures and you are looking for the likelihood of any particular threat happening.
You can break it down into two sorts of risks: very high likelihood risks, both low impact and high impact, low likelihood. And it is the very high impact low likelihood risks where you need the business continuity planning safety net, because you are not going to be able to afford to mitigate those collections of risks. For example, if you take the oil company we spoke of. An oil company can do certain things to mitigate risks of aviation crashes on their installations, but only so much. So therefore what you do with an oil installation, say ‘well it is not very likely that a plane will crash on my refinery, but if it does I am going to put all these processes in place to respond.’ And similarly with a bomb attack. You can do certain preventive measures to keep people away from boundaries. You can put boundary fences. You can make sure that cars do not come into the site. All those will mitigate, but in the end of the end day you will need a business continuity plan to account for ‘what happens if this and this happens.’
Spinks: We are in our second cycle. The big interest in BCP came in about 1998 when we were planning for the millennium Y2K with virtually every utility, every major company in the UK had a pretty good BCP plan for multiple simultaneous failures. And that was an excellent period for BCP. Since then we have noticed that some of these plans have actually not been maintained, the people that built the plans have gone on to do other things. So there has been almost an erosion in the BCP activity until September 11th. And many organisations have looked at those events and governments have looked at those events and taken a view that firstly we have to prevent them. If we cannot prevent them, we have got to have plans in place to respond because the protective measures against these types of terrorist attacks are pretty difficult to prevent if somebody is really intent on getting into your company and setting it alight or exploding a bomb. It is pretty difficult to defend against that, particularly when they are taking the measures and doing the things that they are. Therefore it is even more important to have business continuity in place. There has been a great deal of additional activity in BCP and there are two absolutely brilliant case histories that we are putting together in EDS, and that is the work that we did for a major financial institution. Our customer was based in the World Trade Center and we had that financial service customer up and running within 24 hours of that event. The second one is we actually had teams of people working in the Pentagon, luckily not in the area of where that aircraft hit, and almost within 60 minutes of that event we had teams of people working for the U.S. government and the Department of Defense recovering some of those systems. And I think everybody learns in those events, but I think that we responded as a corporation extremely well.
Spinks: If you look at that type of scenario, i.e. terrorists acting within a community, certainly within the UK, we will then go to a different group of people, because what we do is to go the emergency planners. And most good BCPs for most significant organisations will have an interface with the local emergency planning community. The emergency planners are part of the Home Office infrastructure and their job is based in the local authority but looks at scenarios like major explosions and other such. They have plans in place to respond to those events and needless to say they are based on a gold, silver, bronze structure. They will involve the police, ambulance and other emergency services.
Corporately, i.e. within a BCP, the BCP manager is simply going to be networking with the emergency planning officers. And say ‘we are doing this and we are going to take part of your emergency plan in the event of a flood, a fire, an explosion in the community’. And those two things will loop together.
For the big organisations, for instance, I was involved in doing some BCP for a major oil company client and when we did the stress testing of some of the continuity plans we actually involved the local emergency planning officer, the police, the ambulance, and the fire service. My previous employer was a company called AEA Technology and we wrote a number of plans for nuclear sites and they did exactly the same thing.
Spinks: Very small companies may be better equipped to respond to a crisis than larger companies because they can do things quickly. Their communications quite often withstand the crisis and that is the thing which causes most pain – the breakdown in corporate communications and / or the communications between the corporate entity and the press and the media. So in some ways smaller organisations have less of a problem. The organisations that concern me slightly are at the top end of the SME market where you look at organisations, which employ a hundred to four hundred people. Where they may be multi site and they may be largely dependent on information technology, yet they have not invested in either information security and hence not in BCP. We have one or two examples where SMEs at that stage in the dot.com category have suffered because they were unable to respond to a loss reliably for instance on their web sites. They only have one web site. As soon as their web site fails for whatever reason, they are almost out of business. And they have not invested for a number of reasons in either information security or business continuity planning. And that is the type of company which might be a supplier of goods and services to a major organisation. And that is where we go back to the supply chain assessment. It is really important to push information security and BCP down the supplier chain.
Spinks: Many organisations are now recognising the benefits of partnership. We keep hearing that large corporate clients who are good at partnering have good profitability and good resilience to unplanned events. And that is what crisis is – an unplanned event.
If you look at establishing a good partnership, it’s based on trust. Trust is built on knowledge – how does that company operate, how do we operate? And the trust is built by knowing that whatever happens between the two entities that the supply chain is going to survive. Quite a lot of it is about awareness, but at the end of the day if you establish a relatively flexible contract, part of the contract process is an audit and / or a review. And you are not just auditing the company for security or for a BCP, you are auditing the supplier against a whole range of criteria which really should include a quick look at their BCP, if they have one and whether it has been tested as part of due diligence in operating that contract. I would place it in that category. And then you can reach a partnership.
Even for the large suppliers this can be employed successfully. I am not saying for every supplier, if I remember correctly General Motors had at one time 40,000 suppliers, and you just cannot deal with that number. But one thing the core organisation can do is actually to rank its suppliers on criticality and not on value, not on revenue, but on criticality. There is case history to help us here—take the case of Ford in the UK and door handles. That was a major problem for them caused by a relatively small supplier in revenue terms, but that component loss stopped the production line. So it is ranking the supplier on criticality and then working with the most critical suppliers and building those suppliers into your BCP tests and plan. That is not a huge issue as it can be done relatively easily. But if you have 40,000 suppliers, you probably look only at the top hundred or top two hundred again for criticality reasons. That should be part of risk assessment anyway.
If you look at a typical car plant: Part of the risk assessment would be to look at what could stop the suppliers’ parts as if the suppliers’ parts stop the line stops. So therefore that is clearly a critical risk and part of the mitigation clearly is making sure that you either have a duplicate supplier, standby supplier, or that supplier has processes and procedures in place to respond to his fire, flood, or his loss of IT. Simple stuff really.
Spinks: It goes through a number of phases and it is the initial phase, which is the stakeholder values and business values, that needs to be done at board level with senior management involvement. Having defined those business objectives and stakeholders’ issues, the plan is worked through very much with each operational manager, because effectively what you are doing is looking at each business process. And then you must work out a plan for each business process recognising the dependencies across them. So you will be working with the senior managers at the operational level, because at the end of the day those are the people who run the company. The accountants actually do not run the company. They make sure that the books are balanced. The IT people should not be running the company; they should just be providing the IT. But at the end of the day you need to involve the whole peer group at that level to make sure that you got the impacts correctly assessed. Accountants can help to do the quantification of value of assets. IT people clearly have a major role. So you should look at involving the people from the operational manager right down the frontline.
Spinks: I would go back there and say that the starting point for looking at resilience for IT systems is a checklist based on BS7799. Because it is a pretty good starting point from understanding and of course we are looking at three aspects of information security: we are looking at confidentiality, integrity and business continuity is focused largely on reliability.
Major Disasters quite often impact the availability side of security and reliability. However, we got a number of case histories from financial services where the largest loss potentially is not anything to do with direct loss of IT. So the largest losses to date have been losses of confidentiality in the area of privacy where companies have been severely embarrassed by loosing personal data. If we look at evaluation of assets the one asset which is most often missed is either brand value or reputation value. What happens if we lose the integrity of our web site and personal data gets exposed? That is a significant loss, but it is a loss against reputation and it is quite difficult to assign a precise value to that.
Spinks: That is really an easy question to answer believe it or not. Because what you need to do is simply to look back on the case histories and learn from the case histories. And I will give you two. Firstly there was Three Mile Island. Three Mile Island was a communications disaster, because whilst we had technically a nuclear power plant that was going critical, the people who were managing that stopped communicating. So much so that the state’s representative responsible for issuing an evacuation order heard the evacuation order that he should be issuing to the local public announced on radio. Somebody else just issued on the radio and people just started evacuating and the guy who was responsible for it was listening on the radio.
The second thing is bad communications with the press and the media can actually hinder the operational recovery. Because what happened in Three Mile Island is the press and the media did two things: firstly, they hired a helicopter to fly over the plant which is a huge risk. The last thing you want in a nuclear incident is a helicopter flying over your plant. The second thing they did, which can be managed, as workers came out of the plant, the press & the media were there waiting for workers and number of employees spoke to the press. That should have been addressed at a very early stage and the whole communications infrastructure within Three Mile Island just broke down completely. This is a really good case history to say if we are going to have good press & media communications there are huge number of lessons to be learned from that.
Another case example of serious damage being done to an organisation by lack of good communications was when Challenger V blew up and we lost seven astronauts. NASA took one hour from that blowing up to get in front of the press. And do not forget the press & media were sitting there when it happened and they took an hour to confront the press and the media and do you know what she said? ‘I think we lost seven astronauts.’ And that was the total sum of the statement to the press & media. Now that was not so bad actually, pretty bad, but there was man in the Pentagon sitting watching this on television and they had not given him any information either. NASA took two years to recover from that lack of poor communication, because their bosses at the Pentagon just went berserk. And that was because they were just sitting there waiting for somebody to contact them and let them know what happened and they did not. So that was a very serious for NASA.
There is a whole raft of examples here in the UK. We have seen major safety incidents managed well, but messed up because the person fronting the press & media has not had the right training and/or is an inappropriate spokesperson. We have seen board directors and executive officers standing up doing personal interviews live on television. From recent personal experience you do not do that in a crisis. You let professional do it.
Charlotte Steele, EDS PR Manager: I have been involved in a number of crises in my role. As a public relations professional the most important activity is the delivery of regular, accurate information. And you have to admit responsibility wherever it lies. If you follow those guidelines, the communication should maintain the crisis from stakeholder point of view. So you are alright and you have the press on your side, if the press is against you the company’s reputation will be stained for a long time.
Spinks: Reputation management, the biggest losses have all been in that area. And however much people will say it is the technical recovery or it is the IT. It is not. It is the PR guys who have a major role at that point in time. And they allow people like me to do our job and recover systems.
I was involved in three major incidents over the last ten years. Two I cannot talk about, but one I am happy to share with you. It was a non-nuclear plant and we had a theft of equipment. Not only was the equipment stolen, but it was stolen from a system which was a real time system monitoring the quality of air across the UK. And I got a call at four o’clock in the morning. And if you look at sleep patterns there at their deepest at four o’clock. The first question I asked was, ‘why is my member of staff going into work at four o’clock in the morning’ - I never did find that out. He went into the office and was presented not just with a whole raft of empty PCs, but all the chips have been taken out and the major server systems had been taken away and, what is worse, these guys had covered their tracks by pulling a hose. So there was water everywhere. And he rang me up and said, ‘I have got a problem. In fact we have a problem. In fact AEA technology has a problem. We’ve had a theft.’ I said, ‘well tell me about it. Have you turned the water off?’ ‘No.’ I said, ‘go and turn the tap off.’ Literally the water was still spraying around the office. And then we put the BCP into place which was firstly: ring the head of physical security. ‘You need to get there. Don’t worry I will go down the list.’ So you get the head of security in because the first thing, which needs to happen, is to cordon off the area. Nobody goes in because the police and investigation officers will need to get in there first. So that is the first thing. Second thing is a call to the head of PR, ‘Kevin we have a problem. Get out of bed and get writing a press release because we lost the air quality monitoring and we need to issue it to the press. Clear it with the client. Tell the client. You manage the client and the press. Get on with it.’ And that was it. We activate that within ten minutes of finding out. That recovery went according to plan and within eight hours we had all the systems up back and running. That is not bad from scratch.
Spinks: I think BCPs may as it stands at the moment have to change. Where I see a growth and where we have to put systems in place is in risk management. We need the risk management in place and it could be that part of that risk management is a BCP or it might also be better security, better mitigation, better insurance, better resilience and if we call that BCP then okay. BCP will continue to play a role in running organisations. But I think we will need to think about who is involved? Because you got to get it right and calling it BCP might be too IT-ish. So we may have to call it something else. But yes, it has a vital role particularly in organisations where there is a threat to either the environment or a threat to safety.
 This is an escalation scale for the UK government to response to incidents.
IWS welcomes suggestions
regarding site content and usability. Please use our contact
form to submit your comments.
Last modified: 30 December, 2007 by Wanja Eric Naef
IWS Copyright © 2000 - 2008