Is Your
Root Cause Analysis Effort Trigger Happy?
by Mark Latino,
Reliability Center Inc.
Most work environments are
reactive in nature. There is always more work to be done in a
day then there is time to perform work. This is because of
unexpected changes in the work environment that cause the
workforce to respond immediately and without preparation to
return their environment back to the status quo.
Because of this some
companies have elected to do root cause analysis on these
unexpected events. When dealing from a reactive point of view
the management will initiate a trigger that will execute the
performance of a root cause analysis based on vibration level,
hours of downtime, financial impact, etc. Trigger placement is a
GOOD and necessary first step.
The reason it is a good
first step is because the natural progression is to first
realize the facility is reactive to an excessive level. This
discovery is usually through daily or weekly downtime reporting.
Once it is determined there is a problem, measures to control
the situation are implemented. Triggers are almost always the
first response. Some companies measure employee problem solving
performance based on a weighted system of problem types. The
more times the problem recurs the more points the employee
accumulates and is then scored at the years end. Other companies
measure employee problem solving performance by assigning 10 or
12 failure investigations to be completed by the end of each
year.
All this effort is based
on the activation of some undesired event. The events can range
from an electrical fault shutting down an entire section of a
facility to a critical pump breaking shafts every two months.
Why wait for triggers to trip and incur downtime and asset
damage. It is much more difficult to do root cause analysis when
severe secondary damage is incurred. The fact is triggers are a
reactive means to control unscheduled events.
The natural progression
from this new knowledge is to stop waiting for triggers to be
activated and get proactive. When this step is achieved the
facility can move to the next level, GREAT. This will also
eliminate employee pressure to deliver scores for performance
appraisals that may be done in haste to meet requirements.
Performing a Failure
Modes & Effects Analysis (FMEA) is a way to replace triggers and
inform the management the root cause analysis effort is based on
sound monetary results. Each root cause analysis completed will
have a predetermined value that has been identified using the
FMEA. This is going from GOOD to GREAT. See figure 1.
Proaction
is the insight to look at operating areas with a structured
approach designed to uncover potential events that would cause a
trigger to activate. This can be accomplished using failure
modes and effects analysis (FMEA). FMEA is a term used often,
but means different things to different people. The common
thread for all is, FMEA analysis provides focus and points to
the opportunities that will deliver a pre-measured improvement
to a facility.
The FMEA is a proactive
approach to uncovering what you don’t know about your operation.
This is important because there is an assumption that we already
know what our manufacturing problems are. This is for the most
part NOT TRUE. Some may know what the worst problem is but it is
very likely that facilities don’t know what the second worst
problem is or the third, fourth and so on. In many cases we
don’t know what the problems are truly costing us because they
have been below the radar and have become a part of doing
business.
An example of this would
be a piece of equipment that makes a tangible good like a
cigarette making machine or paper converting machine. This type
of equipment can be turned on and off many times during a shift
for various reasons. Sometimes operators shut equipment down
because of quality defects or run at reduced rates because the
full capacity rate causes excessive start-ups and shut downs
which in-turn cause the operators to work harder than if the
equipment was run at a reduced rate.
I will use an example
from the cigarette industry because I have worked in that
environment in the past. This example could just as easily apply
to making candy, bolts, or paperclips. In the cigarette industry
there is an electronically generated downtime called a rod
break. When this condition occurs the operator will respond by
collecting the paper part of the cigarette rod and disposing of
it into the waste can. The operator will then return the tobacco
lost from the rod into the rework container, re-thread the
cigarette paper and push the start button to return the
equipment to the producing mode. This process takes the operator
about three to four minutes. The operator’s response is a tasked
response learned during the operator’s training cycle. This
response was taking place 40 to 50 times a shift reducing the
end of year productivity of the machine by more than 20 million
cigarettes. This was not on the radar screen because it was a
task that was done regularly and was considered a part of doing
the job. See figure 2

There are many small
occurrences of loss that happen daily in facilities and are
considered as the way we do business. When these occurrences are
exposed and calculated for annual loss in hours and dollars it
becomes clear what the financial impact is to the facility.
Where do you get the
data to perform failure modes & effects analysis (FMEA)?
Managers and others often say the greatest assets are our
employees. However, because of technology analysts prefer to
retain information from maintenance management systems. This is
a fast way to get downtime data, parts usage data, etc. Most
often the data provided by the maintenance management system is
what’s on the radar screen or what we already know. When below
the radar data is sought, it is collected from the most likely
source of undetectable or below the radar information, the
employees.
I spent many years in
the manufacturing of products and can confidently say the people
who operate and maintain a facility know things about their
environment that will never be made known unless asked. Most
employees find a way around problems that cause them pain or
extra exertion to perform work. This may include bypassing
alarms that go off with no apparent reason, running at reduced
rates, changing filters prematurely, adding set-screws to loose
couplings, pinning bearings so that they won’t move, tack
welding cracked impellers, and the list can go on and on. These
kinds of things effect productivity and most likely will never
show up in the maintenance management system.
The management can open
this door and learn from employees by following a three-step
method of performing a successful FMEA.
1. Create a process flow diagram of the system you want to
analyze
2. Create a failure definition to be communicated from the top
tier of management to the hands on work level
3. Create a FMEA data collection worksheet that reflects the
issues of concern like: material waste, defect rates, downtime,
safety incidents, etc.
Create a process flow
diagram:
The process flow diagram
reflects the routing of the process. This is usually from the
raw material input to the point of storage or shipping. See
figure 3

The reason for doing
this is to give the FMEA facilitator and employees a visual for
reference during the interview process.
Create a failure
definition:
A clear and concise
failure definition is needed to make sure the employees and the
management have the same understanding of what is considered a
failure. Without this understanding confusion results and your
analysis is compromised. Failure definitions are usually
surrounded by the business climate, a sold out condition or slow
sales cycle.
Failure definitions can
also surround a current problem an area is experiencing, such as
a high rate of re-work, high quality defect rate, high hand
injury rate, etc. Some examples of failure definitions are:
1. Failure is when secondary defects are incurred.
2. Failure is any adverse happening that has human roots.
3. Failure is when the asset becomes inoperable.
4. Failure is when the asset can no longer perform its intended
function.
5. Failure is any event or condition that interferes with
production.
6. Failure is any event or condition that causes the expenditure
of unexpected budget money.
When consensus is gained
on failure definition you are ready to compile a FMEA data
collection worksheet.
Create a data collection
worksheet:
The data collection
worksheet’s task is to create the capacity to capture the
necessary data to identify the significant few failures from all
the rest. To do this a very simple rule is used, frequency
multiplied by impact. The data collection worksheet
configuration does a number of things for the analysis; it
identifies the event, the modes that cause the event, the
frequency of the mode, and the impact of the event mode
combination on the system being analyzed. See figure 4 & 5.


The findings allow us to
uncover what we don’t know. This enables us to change the
outcome because we know what our current cost of doing business
is. I call this being in the catbird seat. The catbird seat in
this case means you can see what others can’t because you took
the time to look now you can make decisions according to solid
information giving you the business advantage.
With this advantage you
can pick the project you would like to do root cause on by the
loss incurred to the facility over a year’s time. This will not
be the case when reacting to a triggered root cause analysis
project. Triggered projects may in some cases tie up valuable
human assets that could be better utilized on projects where
there is greater return to the organization.
Written By Mark
Latino
Mark Latino is Vice
President of Operations for
Reliability Center, Inc.
(RCI). Mark came to RCI after 19 years in corporate America.
During those years a wealth of reliability, maintenance, and
manufacturing experience was acquired. He worked for
Weyerhaeuser Corporation in a production role during the early
stages of his career. He was an active part of Allied Chemical
Corporations (Now Honeywell) Reliability Strive for Excellence
initiative that was started in the 70’s to define, understand,
document, and live the reliability culture until he left in
1986. Mark spent 10 years with Philip Morris primarily in a
production capacity that later ended in a reliability
engineering role. Mark is a graduate of Old Dominion University
and holds a BS Degree in Business Management that focused on
Production & Operations Management.