Solving the Problem

Overview

Now that we have described the problem using meta-data and a clear description, let's elaborate on the problem and perform the forensics needed to develop appropriate solutions to the problem, recommend the best possible solution, and how we might go about implementing the best solution.

Sounds easy enough, but this is where the rubber meets the road.

In this final installment of the A3 Problem Solving Method, we look at the forensics needed to solve the problem and recommend a solution. We do this through a thorough description of the problem, why that problem is important to solve, what our world looks like when the problem is solved and the recommendations to implement the best possible solution. Along the way, we will use techniques that give insights in to the genesis of the problem.

Let's get started.

Getting to the Heart of the Matter

Background Information

You would think providing a description would be enough, right? Well, not really. Providing the description of the problem is only that. We need to know why this is a problem.

Knowing why this is considered a problem will assist the team in building the business case for the priority to fix the problem, and an argument against the cost of fixing that problem. That's a lot of gobbledygook! Let's go back to the many problems we need to solve daily and the limited resources we have to solve those problems.

When problems occur, it is important to know the full impact on our business. Without this, we run the risk the reader of our problem is not going to know why this is important. Sure they can take our word on it, but in today's world, expressing this problem in business impact will go miles to prioritizing the problem.

Here is an example:

PROBLEM SUMMARY STATEMENT

Power outages in the ITFN coverage areas are causing network connectivity problems. Power outages can be induced by storms, equipment failure, provider failure, malfunctioning generators, etc.

PROBLEM *BACKGROUND STATEMENT

Power outages have always been an issue particularly for environment-related outages. Power outages can impact any number of people from 2 to hundreds depending on location. It can also impact people's work, monitoring of equipment/flow, production, among others. Critical systems are on generator backups (TOAS) which are mostly unaffected, unless the generator fails. Communications towers that may be affected, are critical for high-producing wells/facilities and may be impacted especially if they are not on generators - 33% on generators leaves 66% vulnerable.

There are 3 order of backups available today; however, not all are implemented at each location:

UPS (hours), DC Battery Systems (days), and Generators (until fuel runs out). There are some solar power units at wells where power does not exist in the field.

Note how much additional detail is provided in the Background statement, particularly to explain the impact to our business. Important numbers (33% on generators) gives the reader a clearer picture of the scope of the impact, and what we currently have available (3 orders of backups).

In the Background, we don't solve the problem. To come up with a solution, first we need to know where we are on the map (the Current Condition) and where we need to go (the Target Condition, or the requirements to fix the problem).

Map Point A

You remember the good 'ole days when we used maps? You would go to your local AAA office and get a map, or sometimes many maps, that got you to your destination. Of course, to get to your destination, you had to know where you were! That's point A on the map. On the road to solving the problem, we'll get more details on point A for the problem.

In A3, we can describe our Current Environment by providing number of impacted areas, or employees, the systems and the roles systems play in our network, the ways in which errors are manifested, and the steps taken to solve the issue, but which have not worked or are partially implemented.

Once you know where you're at, then we'll discuss…

Map Point B

Where you are going. Your destination. Yes, in the world of solving problems, this destination might seem obvious to the casual observer, but to the sleuth, point B is the descriptive, destination-defining detail!

How systems and applications act normally is documented in the Target Environment. This is where we describe the requirements to achieve based on what has already been established - in essence, this is our requirement. Of course, after any number of years, it may be difficult to dig those original requirements out in which case we can draw on normal operations. There is a huge word of caution here…. when specifying requirements – the target environment – we cannot burden the problem solution with new requirements that we want to achieve. The reason is simple.

The problem we are trying to solve is a "defect" that has occurred in production. Defects are expenses that we pay to keep systems up and running. When we add new requirements, we are enhancing our production - a cost that is revenue-generating and not an expense. In essence, defects and requirements are on two sides of the P&L statement and should not be confused.

With the requirements specified and your destination laid out, we embark on …

Navigating the Open Road

As we drive from one road, highway, and interstate to the next, on our journey, we discover our world of unknowns, much as you would in the journey to define the probable causes of a problem. In this stage, we look for all possible causes through investigation of system logs, individual system behaviors, some low-level black box tests, and perhaps even to reach out to get professional consultation from experts. In this stage, we perform a Gap Analysis between the Current Environment and the Target Environment We can use techniques such as fish-bone diagramming to derive categories of problem areas and to detail each area. Fish-bone diagrams are also a great way to keep the problem-resolving team focused on the effect while finding the causes (that's why the fish-bone diagram is also called a cause-and-effect diagram). While these techniques are available, it still requires time from us and the problem resolving team.

It's talking through the effect of working backwards to the cause. However, we may be tempted to stop at the first discussion of a cause and move on to developing a solution. But here is the thing … the first cause we come to is only the beginning of getting to the Root Cause.

Root Cause

The Root Cause of a problem is the thing that started the chain of events leading to the observation of the error. Often times, what we see is not what is the genesis and it is our responsibility to know the genesis because the solution lies there. Said another way, if we are not fixing the genesis of the problem, then we can expect the problem to occur again.

Quick analogy? Inside my house, I see the paint on the wall is sagging. As I try to find the cause, I notice the dry wall is soft to the touch. In fact, I bet I can push it in with my finger if I push a little harder! So, I could conclude that the soft dry wall is what causes the pain to sag and stop there to just replace the dry wall. However, I then cut the old dry wall out only to find there is moisture inside the wall. Okay, more details. To continue my investigation, I follow the moisture up the wall into the attic where I find there are water spots on the joists and the ceiling visible from the attic. Let's go up further to the roof, but now I need to call a professional who can climb a tall ladder and inspect the shingles, roofing paper, hips, valleys, soffit and fascia. After the expert has analyzed all of these components, I find out that a facade connection on the roof has finally opened up enough to allow rain water in the house! Now I have the root cause which I can then plan to repair (with the help of experts, in this case) before I clean the attic, replace the wet insulation, replace the dry wall and repaint the wall!

Determining root cause gives us opportunities to solve the right problem.

Implementing the Solution

Driving to the Solution

Just like finding the cause, finding a solution is a multi-step process. First, determine all the possible solutions that would fix the root cause. This can involve brain-storming with the right people, and keeping all options on the table. In brain-storming, there may be suggested solutions that sound weird. Some might be spot-on. In any case, be sure to document all the possible solutions.

In Problem Management terminology, we call these Countermeasures.

The Solution

Of all the Countermeasures listed, the team now has to determine which one to implement. It might be tempting to list more than one for the same category of problem; however, what you likely have are sub-tasks of the main solution. We'll get to that in the final stage of this article, but for now, drive to one solution for the one root cause.

Solutions must be evaluated for their effectiveness to solve the root cause. Ask yourself: "Will this solution solve the current problem?" "Will this solution prevent other occurrences of this problem?" "How much will this solution cost us?" "How many people-hours are needed to implement this solution?" "What is the lifetime of this solution?" "Is there already a project underway to change this system that might render the solution useless?" "Is this solution simple and elegant, or are there a lot of moving parts?"

When the team believes the solution is the right solution, we need to plan it out.

The Plan

We're not good at planning. Let's face it, no one really likes it, right? Well, whether you are good at it or not, you still need to know what needs to be done, when, by whom and how long it will take, and in what order. Sometimes, a simple list of actions is enough to get everyone on the same page. If the solution is more involved and has multiple steps, we'll need major activities, sub-activities, predecessor or successor relationships, etc. Anything to make sure we have a plan to implement the solution. Remember, we are putting this on an A3-sized paper. With that much real-estate, we'll need just enough planning to convey the approach. If we need additional details in the plan, that can be completed using another tool, but let's use the A3 to galvanize the team … especially now that we have Point B in our sights!

Closing

On this journey of the A3 Problem Solving Method, we learned how the method came to be, why we use it at Oxy, and how it fits in with the ITSM framework. We reviewed the basic meta data required to define the problem, then continued to describe the problem, and what we want to achieve when the solution is implemented. We looked at Root Cause and implementing the right solution.

Look for future articles on A3 related topics. In the meantime, checkout the references if you're interested in learning more on the A3 Problem Solving Method or on ITSM.