{"id":699,"date":"2014-05-16T16:49:15","date_gmt":"2014-05-16T23:49:15","guid":{"rendered":"http:\/\/itiltopia.com\/?p=699"},"modified":"2017-12-11T08:45:14","modified_gmt":"2017-12-11T16:45:14","slug":"when-disaster-strikes","status":"publish","type":"post","link":"http:\/\/itiltopia.com\/?p=699","title":{"rendered":"When Disaster Strikes"},"content":{"rendered":"<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\"><a href=\"http:\/\/itiltopia.com\/wp-content\/uploads\/2014\/05\/1420-3.jpg\"><img loading=\"lazy\" class=\"alignright size-full wp-image-701\" src=\"http:\/\/itiltopia.com\/wp-content\/uploads\/2014\/05\/1420-3.jpg\" alt=\"1420-3\" width=\"400\" height=\"355\" srcset=\"http:\/\/itiltopia.com\/wp-content\/uploads\/2014\/05\/1420-3.jpg 400w, http:\/\/itiltopia.com\/wp-content\/uploads\/2014\/05\/1420-3-300x266.jpg 300w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a>The true test of an IT organization\u2019s maturity is how it reacts when a significant issue occurs. By significant issue, I mean that a large portion (or even the entire) organization\u2019s ability to function is severely impaired due to an IT-related failure.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">What happens isn\u2019t a test of any single group or process &#8211; it is a test of all IT groups and\u00a0three tightly integrated processes: Incident, Problem, and Change.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">Let me give you an example of how the perfect storm can break a business. Once upon a time, back in the 90\u2019s, I got up very early every morning to be the first person in our office so I could take calls at the Service Desk. On this particular morning, it was evident that something was wrong. Some people couldn\u2019t get to fileshares or email. Some users couldn\u2019t get to the internet. Some people were reporting that client\/server applications weren\u2019t working correctly. Between calls, I launched my browser and pulled up CNN. It wasn\u2019t just our organization. Others were reporting similar issues.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">I immediately called my wife and told her that something big was going on and to cut the internet lines to her office. At the time, she was in charge of her company\u2019s IT functions, and when I explained the situation, she went to the network closet and unplugged the cable to the external service provider. She then put a sign on the entry doors to her office telling everyone not to open emails from any external source.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">It was Melissa,\u00a0a worm-based virus that created massive amounts of network traffic by\u00a0sending email\u00a0out from everyone&#8217;s Outlook client. <\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">Of course, we didn\u2019t know that at the time.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">All we knew was that large parts of the network were unreachable. So what did my organization do? The server team obviously thought there was a faulty card on a server sending out massive amounts of packets. The network team assumed that there was a defect in the network routing. The desktop team thought that a software update was being pushed out and that all the traffic was killing the routers. So everyone made countless, uncontrolled changes in their own little world for about\u00a0six hours.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">This was pure Incident Management at its worst. &#8220;Get it up and running as quickly as possible&#8221; is the first part of Incident Management\u2019s goal. Unfortunately, the second part of the goal was completely ignored. The full goal of Incident Management is \u201cGet it up and running as quickly as possible, <i>while doing the least amount of harm as possible.<\/i>\u201d<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">To say that recovery was challenging is an understatement. Our IT groups did more damage to our infrastructure in six hours than the worm could have ever done. Once the news got out that many organizations were being affected by this worm, our Desktop team spent about\u00a0eight hours canvassing the organization and physically touching every PC to remove the virus.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">Our network and servers weren\u2019t completely back to normal for three days.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">What Change Management was done? None. What Problem Management was done? None.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">Let\u2019s just say that it was an excellent learning experience.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">From that point forward, whenever there was a severe issue, the Service Desk Manager (me) started a conference line and\u00a0then notified all the other IT managers (by email, phone or fax) to dial into the conference line immediately.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">There was a formal structure to the conference call:<\/span><\/span><\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Quorum called<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Problem statement defined<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Roll call made of each manager identifying how the problem was affecting their organization<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Roll call made of each manager identifying what team members were available to work on the problem<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Decision as to who owns the problem<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Decision as to who owns the communication<\/span><\/span><\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">The problem owner would be the person responsible for:<\/span><\/span><\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Organizing the response to the issue<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Engaging IT resources as needed<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Managing the issue to resolution<\/span><\/span><\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">The person who owned the communication (usually me) would stay on the conference line and take any updates verbally from whoever called in. The communication owner would be responsible for:<\/span><\/span><\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Updating the tracking ticket<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Ensuring that all parts of IT were made aware of any updates<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Ensuring that communications to the users were sent out in a timely fashion<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Informing the company\u2019s upper management of the issue and its effect on the organization<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Being a buffer between the problem owner and everyone who wasn\u2019t directly involved in the resolution of the issue<\/span><\/span><\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">The problem owner has to be shielded from all the users, managers, customers, etc., etc. who want status updates or just make the problem owner\u2019s life miserable (it\u2019s amazing how many people want to pile on pain during a painful situation). It is a delicate balancing act for the communication owner (who is being pounded from all sides for the latest status) on how frequently to go the problem owner for an update while realizing that every request for a status update delays the resolution.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">As the issue is worked towards resolution, it is the problem owner\u2019s responsibility to ensure that proper controls are in place for critical infrastructure. <\/span><\/span><\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Documentation updates noted<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Change permissions obtained<\/span><\/span><\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\">Cross functional activities balanced<\/span><\/span><\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">One of the hardest things for the problem owner to do is to negotiate between the teams implementing quick fixes (get the users up and running as quickly as possible) and the teams responsible for determining root cause (who typically needed the users to be left in a broken state for them to do their analysis).<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">Depending on the nature of the event, and how well the action plan reacted to it, there were formal and informal reviews. If things went well, it was usually an informal review. If things went badly, it was more formal.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-family: Calibri;\"><span style=\"font-size: medium;\"><span style=\"color: #000000;\">How well IT is able to work across functional boundaries, still maintain processes, and ensure that sufficient controls are in place during a serious outage is quite often the quickest way to uncover departmental and process deficiencies.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-size: medium;\"><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\">The real lesson here is to never let a disaster go to waste.<\/span><\/span><\/span><\/p>\n<p><span style=\"font-size: medium;\"><span style=\"color: #000000;\"><span style=\"font-family: Calibri;\">BTW: My wife&#8217;s boss gave us a very expense bottle of wine when they came through the day unscathed. <\/span><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The true test of an IT organization\u2019s maturity is how it reacts when a significant issue occurs. By significant issue, I mean that a large portion (or even the entire) organization\u2019s ability to function is severely impaired due to an IT-related failure. What happens isn\u2019t a test of any single group or process &#8211; it &hellip;<br \/><a href=\"http:\/\/itiltopia.com\/?p=699\">Read more <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":701,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[5,17,8,7],"tags":[],"jetpack_featured_media_url":"http:\/\/itiltopia.com\/wp-content\/uploads\/2014\/05\/1420-3.jpg","jetpack_publicize_connections":[],"_links":{"self":[{"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/posts\/699"}],"collection":[{"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/itiltopia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=699"}],"version-history":[{"count":45,"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/posts\/699\/revisions"}],"predecessor-version":[{"id":1472,"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/posts\/699\/revisions\/1472"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/itiltopia.com\/index.php?rest_route=\/wp\/v2\/media\/701"}],"wp:attachment":[{"href":"http:\/\/itiltopia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/itiltopia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=699"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/itiltopia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}