Can You Have a Known Error without Root Cause?

March 28th, 2014 | Posted by Don Boylan in Problem Management

Q:

Query with regards to defining known errors.

Generally I would take a known error to be you have found the root cause of the problem/diagnosed what the issue is and have potential work around.

However, I have a grey area situation in that: We know what the issue is (a database is corrupted); what the workaround is (rebuild it) and how to permanently fix it (the system is running on legacy infrastructure and will have to be replaced at some point in the next few years) but we don’t actually know what is originally causing the database to become corrupted.
Am I correct in my belief that as the root cause has not be identified that I cannot put this as a “known error”.

A:

One of the principles of the ITIL Framework is to take away the IT Cowboy mentality. In the old days, we would do exactly as you suggest. Loosely define an issue and then find a shotgun approach to fix it. And many times we were wrong.

The reason Problem Management is so specific about requiring that the Root Cause (and Configuration Item at fault) be identified is that it forces us in IT to accurately determine what is the failing component before we write up the Request for Change.

It may be that there is an untrained user who is doing something in the app that should never be done during the production day and causing the corruption. In which case the Root Cause is a procedure that is being followed, and the CI at Fault may be the Training Material or New User Training Syllabus.

In fact, by following your example of replacing the entire app with an upgraded version, you may bring the Problem over into the new system. Do you have to work every Problem through to resolution? No. It may be too costly to do the required investigation to identify the Root Cause and CI at Fault. Should you stop implementing a new system just because the old system still has identified Problems? No, the chances are that the majority of outstanding Problems will be addressed if a new system is implemented.

But once the new system is implemented, there needs to be a time period when the old system’s Problems are left open to see if they will reoccur in the new system. Because, truthfully, you never did the work required to successfully take them into the Known Error realm.

Some organization’s management think that having Problems not followed through to Known Errors is a terrible thing. They shouldn’t believe this. It is natural for a mature Problem Management process to uncover many undiagnosed Problems. It is then up to the Process Manager to determine which Problems need the additional resources/time/money spent on them to do full Root Cause analysis.

You can follow any responses to this entry through the RSS 2.0 You can leave a response, or trackback.

Leave a Reply

Your email address will not be published. Required fields are marked *