I’m having some issues wrapping my brain around this particular scenario, given the information I found in the ITIL V3 definitions.Known Error:
A Problem that has a documented Root Cause and a Workaround.Workaround:
Reducing or eliminating the Impact of an Incident or Problem for which a full Resolution is not yet available.
The scenario goes like this:
Multiple users call in to report an application crashing during synchronization, the same error message is seen each time. For each user calling the Service Desk, a new Incident record is logged. After a multitude of calls, a parent Incident is created and all other incidents are linked as child. After troubleshooting the scenario without any resolution the Incident records are escalated to Tier II. Once a workaround has been acquired, the Incident triggers a ‘Known Error’. The workaround is applied to all affected machines to restore service and the Incidents are closed. A Problem record is created to determine the root cause of the Incidents.
This would seem to make sense to me, but by the ITIL definition, I cannot create a “known error” until I have a workaround and root cause. Maybe I am just confused in terminology, but I’d like to add something like the error and the workaround to a known error database so that subsequent calls can be related to that known error database entry.
This very question has often come up when I have been discussing the Workaround concept as it applies to Problem and Incident Management. What I usually say (and this is an adapt principle) is that Incident Management comes up with many (in fact, probably the majority) of Potential Workarounds. If I’m trying to get a critical IT Service up and running, and I figure out that rebooting a server fixes it, then I log the resolution to the Incident as “Reboot server” and, if my tool supports it, flag it as a Potential Workaround. The issue may come up over-and-over, but I can always apply my Potential Workaround to get it going again.It is just a Potential Workaround because, until Problem Management goes in and looks for root cause, it’s just an IT cowboy throwing darts and one happened to hit. Problem Management in their investigation may do root cause analysis and find out that a hung service on the server is causing the outage and that a much less impacting thing to do would be to simply stop and restart the service. Now it is a true Workaround. So why the difference?
When I’m doing Incident Management, the first thing I search is the Known Error Database for Workarounds. They have at least had some research (maybe rebooting the server was the best solution) and have been vetted by someone investigating root cause. If I don’t find anything, I search the Incident database for Potential Workarounds. If I get a hit, I can try it and see if works in my situation.
When I’m working Problem Management and looking for Incidents that are good candidates for opening Problem records on, I look for Incidents which are flagged with Potential Workarounds. These are tickets where the issue isn’t well understood, and the Potential Workaround may deserve some investigation to improve it to a true Workaround.
By the way, in ITIL v3, a Known Error record can be opened at any point in a Problem’s lifecycle regardless of whether or not you have Root Cause or a Workaround identified (or even after the fact during a post mortem analysis).