|
Human Error in Maintenance and Reliability and What to Do
About It
by Jack R. Nicholas, Jr., P.E., CMRP
Download slides (1.2
Meg PDF)
Click arrow to
play 50 minute audio lecture - send email to
tohanlon@reliabilityweb.com if you want MP3 file for your
iPod or audio player
Abstract:
This paper describes the cost and consequences of human error in
maintenance and reliability (M & R) in a variety of venues such
as utilities, manufacturing and government. Key elements will
focus on:
-
A long
term strategy and set of tactics to assure that errors are
progressively minimized and/or mitigated for the duration of
an organization’s existence,
-
A
simple analytical method to detect recurring problems that
cause seemingly small delays or reduction in throughput or
delivery of a service,
-
Who to
turn to first for solutions, especially when safety
and/or equipment reliability are concerned and how to get
their attention,
-
What
types of policies get the most cooperation from all levels
of an organization, when the goal is to minimize human error
and maximize profits in an increasingly competitive, global
economy,
-
A
systematic root cause analysis technique that focuses first
on the human elements rather than on the technical elements
of the overall problem,
Introduction:
In observations from leadership positions over the past 50
years, I have seen many “new” approaches set forth as the
ultimate answer to maintenance and reliability ( M & R)
improvement. Some of the promoters of these new methodologies or
approaches have attempted to appeal to all organizations in
commerce, academia and government. These ideas come and go –
mostly go – into obscurity. However a number of fundamental
truths and fairly simple concepts seem to me to work best in all
venues. This presentation is an attempt to summarize those that
I have found to work best and to provide some guidance as to
where to find more information on the basics that seem to me to
work universally. These observations are presented in the form
of “concepts.” The concepts are presented to help illuminate the
most common human errors. Errors are committed because of
omission or ignorance of the best ways to get consistent results
from people who, in the overwhelming majority in my opinion,
want to do the best possible jobs that they are assigned or
choose to do. The concepts are inter-related and cannot be
encapsulated uniquely. So, it is not possible to isolate one and
concentrate on it without missing something and not doing so
well in attempts to apply each of them.
Concept #1 - Exercise and practice leadership as well as
management.
Starting at the nominal “top” of any organization, its
“management” group or individual, I find that many problems
could be solved easily and most effectively if attention is paid
to the functions that those in top positions of organizations
should be performing. In my mind the most important of these
functions can be summarized as follows:
-
Providing direction, objectives and goals through effective
(two-way) communications,
-
Obtaining (and keeping) the resources needed for people to
do their jobs most effectively and efficiently,
-
Removing impediments to reaching the ultimate objectives of
the organization,
-
Adding
or removing constraints that are needed to keep the
organization focused,
-
Thoroughly understanding, and constantly refining the
processes that the organization has to execute to serve its
customers.
-
Providing leadership
Unfortunately for those subject to management with little or no
leadership, one finds that a great deal of time is spent on
organization and reorganization, a waste of time for anything
other than communications, in my opinion.
In the course of executing management functions, however,
“managers” many times do not (or do not know how to) exercise
leadership. The goal of this paper is not to teach
leadership, but merely to call attention to the absence of it in
many instances and where to find information on what it is and
how to learn about it.
One of the key ways of exercising leadership is to learn to
listen. Listen to those in positions above (who must serve
all below), co-equal persons along side you and employees
in positions below yours (those you serve). Combining the
two key elements of leadership and continuous process refinement
mentioned above and discussed in detail in the references listed
in footnotes below, the best way to obtain and maintain the
processes related to maintenance and reliability (and the
overall business processes from which the latter flow) is to
engage key employees in technical and first line supervisory
positions of the organization in developing them. The initial
process diagrams should be complete with the identification of
existing impediments, needed (and unnecessary) constraints and
resources needed to make them most effective and efficient
(clean and clear) in actual practice. It’s the manager’s job
(and should be his or her “agenda”) to make this happen, once
the list of impediments, constraints and resources is prepared.
He or she must get busy and act as decisively and rapidly as
possible to clear any obstruction to success of the processes
needed to serve both internal and external customers. In so
doing, the smart manager goes far beyond obtaining “buy-in” but
also creates a sense of “ownership” in the processes by all who
participate in development and refinement, a far more valuable
attribute.
Concept # 2 – Look first at “programmatic” rather than
“technical” solutions to reliability problems.
In the decade after the 1979 Three Mile Island incident
involving a nuclear reactor core meltdown, the organization
responsible for regulation of commercial nuclear power plants
and related facilities, the U.S. Nuclear Regulatory Commission (USNRC),
searched for ways and means to avoid future events of this
nature. Many of the solutions involved redesign of power plant
control rooms and instrumentation as well as plant safety
systems for mitigating and controlling any malfunction at its
early stages. This was nothing new to those in the regulatory
organization. It was revisiting reactor and safety system
designs that it had concentrated on from the USNRC’s beginning
even before separation from the Atomic Energy Commission and the
Department of Energy. What was new to them was the need
to regulate the operation and maintenance of existing plants,
something that was in an early stages of development and
implementation when the 1979 incident occurred. In fact the way
the applicable law and regulations were written in 1979, USNRC
had only limited authority to regulate maintenance practices. It
was not until the mid 1990’s that, despite intense lobbying and
resistance by nuclear utilities and their various industry
organizations, the “Maintenance Rule” became law, thus giving
the USNRC authority its Commissioners felt was needed to assure
safety.
Among the many studies performed under USNRC sponsorship was one
concerning “programmatic root cause analysis.” A model was
developed addressing the essential precepts of the causes of
human errors in preventive and repair (often called
“corrective”) maintenance. The model was tested for several
years at a nuclear power plant which had new and open-minded
management (for reasons which will become obvious as you read
ahead). The model provided a set of four (4) “diagnostic query
diagrams” titled as follows:
·
The Training Query
·
The Procedures/Documentation Query
·
The Quality Control (QC) Query
·
The Management Query
Some surprising results came from this study, including but not
limited to the following:
-
Although technicians performing maintenance were not
eliminated as root causes of defective maintenance, their
inadequate performance was found to be most likely the
effect rather than the root cause of subsequent
(infant or premature) equipment failures.
-
The
queries focused on isolating such root causes as line and
upper management performance (including their
attitude towards their responsibility for craftsperson
performance), procedures and documentation (both the
product and the process), training (both delivery and
the process), managers of such program elements and
quality control .
-
Quality control was found not to be a primary cause,
but a co-cause.
-
Management was found to be both a primary as well as a
co-cause of inadequate maintenance by technicians.
The landmark finding that management performance was often the
root cause of equipment premature or infant failure after
maintenance had monumental impact. Often, the continuation of or
granting of new operating licenses for nuclear generating plants
and related facilities rested upon the assessment by regulators
of attitude of nuclear utility managers at all levels towards
this cause. This led in some cases to major shakeups in
management teams until those in place could satisfy regulators
that they understood their
shared responsibility for equipment failures along side those
whose hands were actually on the equipment.
This finding concerning management’s direct involvement in
equipment failures is seldom, if ever applied outside the
commercial nuclear power industry.
Only now is it being considered for application (along with many
other initiatives) to the British Petroleum (BP) Refinery at
Texas City, Texas involved in a fatal accident on 23 March 2005
that resulted in 15 deaths and a much larger number of serious
injuries.
Not only were there costs involved to those on site and BP
stakeholders world-wide. The long outage that was needed for
disaster recovery (along with maintenance problems at other
plants for a variety of reasons) caused a shortage in refinery
products that everyone in the country paid for in terms of a
spike in fuel prices for many months after the accident.
Another finding that ultimately came to light from the USNRC
sponsored study was that the majority of root causes
of infant or premature equipment failures could be eliminated or
at least mitigated and reliability improved less
expensively and more rapidly by programmatic
solutions than by technical (re-design) solutions. That’s
why the concept is stated as it is
Those who have developed approaches to root cause analysis of
equipment failures all claim that their methods address the
queries listed above, and there is no doubt that this is true in
theory. However, in actual practice management and many other
programmatic causes of failure are considered co-incidental, if
not prohibited, areas of investigation requiring corrective
action. That is unless and until a major fatal accident occurs
and an outside, independent panel conducts an investigation, as
happened in the BP Texas City case.
Indeed, the programmatic root causes of failures may well go
beyond the industries that suffer from them. The rash in 2006
and 2007 of children’s toys and costume jewelry items containing
Lead having to be removed from. store shelves certainly has as
one of its root causes the lack of oversight of foreign
manufacturers by their U.S partners and the significant
reduction in staff and other resources for testing such items at
the U.S. Consumer Product Safely Commission (USCPSC). At least
one of the major toy distributors in the USA apologized to the
American people in testimony before a Congressional hearing
committee on the subject in mid 2007. On 11 September 2007, the
USCPSC and its mainland Chinese government counterparts
announced an agreement on work plans concerning safety of toys,
fireworks, cigarette lighters and electrical products.
Findings of contamination in food imports in a large number of
products from several countries, that got heavy media attention
during 2007, may be traced to the lack of U.S. Department of
Agriculture and/or customs inspectors at ports of entry and in
U.S. processing plants. Certainly these may also be traced to
lack of management attention or poor attitude, but a co-cause
may well be lack of resources resulting from political decisions
concerning regulatory agencies charged with inspection and
enforcement of the rules. Countries of origin may also share
some of this responsibility. In China, a whole new regulatory
regime is being created to address this deficiency and many
marginal producers have already been shut down as a result of
early action.
Concept # 3 – Look for indicators of small, seemingly
insignificant but repetitious reliability problems and act on
the findings.
Another feature of the report on programmatic root causes of
equipment failure was the description of an easily implemented
analysis method for determining where to apply limited
resources to solve many problems of equipment maintenance and
reliability. The method was given the title “Cluster Analysis.”
Cluster Analysis is explained in just three pages of the report
and consists of the following steps:
-
Sort
the Data
-
Identify clusters
-
Determine which clusters are relevant
-
Group
the clusters into categories
-
Determine the consequences of relevant clusters
-
Determine technicians involved, when necessary
The table that follows provides an indication of how statistics
on relevant clusters looks over time.
|
Problem/Category Cluster |
# in 1988* |
# in 1990 |
|
Damage |
28 |
15 |
|
Tubing/Fitting |
13 |
|
|
Packing |
2 |
|
|
Seal/Gasket |
4 |
2 |
|
Tightening |
19 |
|
|
Wiring |
1 |
|
|
Procedural |
18 |
10 |
|
Diagnostics |
2 |
1 |
|
Repair |
2 |
|
|
Weld |
2 |
|
|
Total |
98 |
28** |
* Actually an average of 1987 and 1988
**71% REDUCTION
This is the sort of analysis that can be carried out at any
level of an organization and should lead to triggering some root
cause analysis activity and follow-up. The numbers presented
reflect the reduction actually experienced at the plant where
this method was tested. As observers get more familiar with the
steps of cluster analysis, new clusters will emerge as earlier
ones are dealt with.
Many of these items in and of themselves may cause small delays
in production, but their cumulative effect over time can have
quite a substantial effect on the bottom line and/or mission of
the organization that engages in this relatively easy method of
identifying problem needing action.
Concept #4 – Don’t be afraid of mistakes; learn from them.
Typically after an incident involving substantial cost to
recover, injury or death the search is started to find the
“guilty” parties so that they can be held accountable. This is
the wrong (management) approach in all but those cases where
malicious intent is apparent initially or determined to be a
cause in the course of investigation of the incident.
Managers sometimes contract for third party investigators to
find the root cause of serious events. This may be OK for the
overall look at what happened and to prepare a professional
report of findings. However, it was been shown to be the cause
for those involved to take a defensive approach that impedes the
full story being revealed. This in turn inhibits appropriate
action being taken to eliminate or mitigate of the true root
cause, liability claims and related court cases notwithstanding.
Policies and practices that have been proven to avoid repeated
problems described above are discussed below.
Adopt a No-fault Policy
- Adopt a no-fault policy regarding apparent accidents and
incidents. The policy should have a corollary provision that
emphasizes the need to learn and not suffer unnecessarily
from undesirable events. Stopping any attempt to “blame” someone
will aid in more quickly getting to the truth of what happened
and the ultimate solution. Learn from the mistake;
correct the problems, and get on with business of serving
customers and providing all stakeholders with the fruits of
their investments in the organization. Those who fail to learn
from mistakes and repeat them should be assigned where they
can’t continue to cause harm to people or equipment or, as a
last resort, be let go.
Adopt a Compliance Policy –
Implement a compliance policy that applies to the use of all
operating and maintenance procedures as written or if found
deficient in some way, as modified by competent personnel
following the approved procedures management process. (See
Concept # 5 – This assumes the organization has adopted a goal
of becoming a “Procedure Based Organization” which in turn has a
formal feedback and follow-up process in place that assures
prompt action on all recommendations for changes.)
Practice Peer Review –
When a major equipment failure and/or personnel injury/fatality
incident occurs involving one or more personnel, and there is an
opportunity for interview of those involved, institute a
practice of “peer review.” The purpose of this practice is to
fully identify what happened and what should be done to
eliminate or mitigate the incident being repeated in the future.
The chances of those involved producing an accurate picture
of what happened and coming to a conclusion as to what to do to
prevent such incidents in the future is greatly increased when
they are talking to their peers, without managers or other
“outsiders” present. The intent of protecting co-workers and
other stakeholders from repeating any errors must be the central
goal of such a practice. It can be effective, however, only
if the practice is backed by the no-fault policy stated above.
Ultimately it may be found, as indicated in the concept
describing programmatic root cause analysis, that management
needs to do something to eliminate or mitigate the problems
revealed. This may include providing more training, better
documentation or even changing their attitude concerning
provision of other resources needed to ensure no repeat of the
incident. The no-fault policy applies to management, also, as
viewed by those who are subject to its leadership.
Practice focusing on the incident at hand while it is being
investigated.
Anyone who has been involved in root cause analysis or
reliability centered maintenance analysis knows how easy it is
to have the process prolonged or even sidetracked by unrelated
issues that arise. There is always a strong desire by those
assembled to perform such tasks to discuss all the
perceived, current problems of the organization. Those
assigned to facilitate such analyses must acquire and liberally
apply the skill of diverting such discussions and re-focusing
attention on the matter at hand. One very effective way of
doing this is to start listing, by title only, “Other Items of
Interest” for presentation along with the report concerning the
incident at hand. Thus the group discussion can be refocused on
the
incident, along with assurance that the list of Other Items of
Interest is prepared for presentation to management along with
(or included in) the root cause report.
Concept # 5 – Become a Procedure Based Organization, but don’t
overdo it.
In a variety of presentations I have written, co-authored or
contributed to the emphasis has been on becoming a “PBO – a
Procedure based Organization.” In this text an example is
provided where such advice was taken too far. A Procedure Based
Organization produces or receives and complies with detailed
written instructions for conducting not only maintenance, but
also operations and routine checks. This seems so basic that it
is overlooked in most organizations and for all the wrong
reasons! It’s so much easier than it used to be, given
availability of low cost word processing and scanning and image
insertion equipment. There is hardly any excuse for not doing
it, given the benefits derived in terms of increased reliability
and consistent delivery by the operators and maintainers of the
maximum possible capacity of a production line. The fundamental
approach is depicted in the diagram below.

Not only does an activity have to declare that it has a
Procedure Based Organization, but it has to back it up with
a working process for procedure and checklist origination,
dissemination, feedback and follow-up. The idea of feedback
and follow-up is reinforced in the diagram above by arrows
that imply two-way paths for communications. It is not
enough just to disseminate an initial set of procedures and
checklists. Users must have on-going evidence that
their ideas for improvement are being received,
considered and acted upon promptly. Changes that are
concurred in must be seen to be incorporated in revised
procedures and checklists coming out of a process that
functions as well as is expected of all the
maintenance and operations processes it supports. Otherwise,
enforcement of a policy requiring compliance will quickly
become impossible, because of a perception that management
support for the process and related policies is weak or
non-existent.
In July 2004 I conducted a one-day seminar in response to a
query concerning what it took to become the “world’s best
maintenance organization.” The activity where the seminar
was held had been operational for only 18 months after
rejuvenating a portion of a steel plant that had a hundred
year history before shutting down and going out of business
three years earlier. The new organization was doing quite
well, having returned the equivalent of 80% of its new
owner’s investment in the short time it had been operating
under new management and carefully selected staff. However,
all there knew that world steel prices, then inflated due to
the “China Bubble,” could very quickly deflate to where they
might not be competitive with foreign suppliers of the
products they manufactured. They saw maintenance as an area
where their equivalent profit margin (return on investment
to their owner) could be improved and their own jobs kept
securely in the USA. After attending the seminar, which
stressed, among other things, use of detailed procedures and
checklists for both operations and maintenance,
management decided to apply the principles to startup of one
of their most complex manufacturing processes. The operating
and maintenance staff prepared a check-off list for start up
of all systems needed to roll steel bars into coils of wire
ready for shipment. Typically this evolution, which occurred
every Monday morning, was fraught with multiple delays while
the systems involved were aligned correctly and adjusted to
the required level of throughput.
About two weeks after the seminar, I followed up with the
company president. He volunteered that they had applied the
rolling line startup check-off list for the first time that
week. They decided to run the check-off twice before the
first bar of steel was introduced to the line. They found in
the first check that they had missed two items. After
correcting these items before the second run-through of the
checklist, the startup went without any delay or incident, a
first for that plant under the new staff. If ever there was
a “Hallelujah Moment,” for one preaching the benefits of
detailed procedures and checklists, that was it for me.
In the summer of 2005 I conducted a procedure and checklist
workshop for Gallatin Steel Company in Kentucky, which is
owned jointly by Brazilian and Canadian firms. Following the
lead of one of its owner companies (Dofasco of Hamilton, ON,
Canada, which that year had been declared by The Wall Street
Journal the most profitable steel company in the world) the
management decided to embrace a key element of the parent
company’s success -- use of detailed procedures and
checklists for maintenance. In the course of the workshop
conducted for key technicians and supervisors (with managers
present only for the beginning and ending sessions) a
detailed process was developed for origination and on-going
support of procedures and checklists. A format and detailed
outline was decided upon for the actual documents and the
decision was made to produce all them in house, using
overtime to pay those craftspersons who volunteered to write
the procedures.
Two years later, following up with the project manager, I
found that the organization had produced over 500 detailed
preventive and repair maintenance procedures and checklists.
In response to a request for an opinion on what the major
benefit was from all this effort, he responded by saying
that the biggest benefit was the significant increase in
confidence that the work force had gained in performing
maintenance. Delays and frustration with not having the
correct tools or replacement parts was radically reduced.
The company has been rated by the Kentucky Chamber of
Commerce and the State Council of the Kentucky Society for
Human Resource Management as one of the best to work for in
the state. Forbes Magazine ranked Gallatin 16th
overall as best large company to work for in the USA in
2006. Its parent, Dofasco has consistently received similar
recognition in the Province of Ontario and in Canada overall
as one of the best places to work.
On the downside of this concept, it is possible to demand
too much of the craftspersons who are required by a
compliance policy to use procedures and checklists. Recently
I was requested to participate in a conference call with the
representative of a major corporation which on any given day
operates about 1100 facilities world-wide. Also on the
conference call were representatives of one of their
contract maintenance suppliers. The craftspersons of the
contractor were resisting the imposition of mandatory
check-offs (by initialing) for each step of every
maintenance procedure they were required to conduct. In
addition, a rigorous audit procedure with punitive
provisions for non-compliance by maintenance personnel had
been prepared for implementation as part of the customer’s
compliance policy. The craftspersons who were pushing back
had, in my opinion, a good case for doing so. The client had
gone way beyond the best practice in use of procedures and
imposition of a companion compliance policy.
In organizations engaged in this best practice several types
of procedures (and checklists) are commonly used. These
types are summarized in the table below. The basic ones are
given titles like Standard Operating Procedures (SOPs),
Special Operating Procedures (SpOPs), Critical Operating
Procedures (COPs), Standard Maintenance Procedures (SMPs)),
Special Maintenance Procedures (SpMPs), Critical Maintenance
Procedures (CMPs), Preventive Maintenance (PM) or Predictive
Maintenance (PdM) procedures. Standard, PM and PdM
procedures define common, often repeated, operations,
maintenance or condition monitoring tasks.
All but critical procedures may be written in “two-tier”
format. The first tier is an abbreviated version of the
second tier that provides a more in-depth explanation and
additional steps for use in training of new personnel or
occasional review by experienced personnel who may not have
performed the standard task for some time.
Operating procedures in organizations following current best
practice often contain many routine preventive maintenance
tasks which are assigned to operators for completion. Note
that individual sign-off on each step is required only for
safety or critical task procedures (and checklists).
Typical Procedure and Checklist Categories
|
PRIVATE Procedure/Checklist Type |
Where Used |
Manner of Using |
|
Safety or Critical Task
(COP, CMP) |
Complex evolutions where safety of
personnel and/or hazards to equipment are principal
concerns. |
Verbatim compliance
Reader/worker team approach
Individual step sign-off |
|
Standard Task (SOP, SMP, PM, PdM) |
Operating and maintenance procedures
for common, often repeated tasks |
Procedures available on file
Used as training documents as well
Can be taken on job site if needed
Captures experience
Utilizes skill of the craft |
|
Special Task (SpOP, SpMP) |
Procedures for major, complex and/or
infrequent maintenance and/or related operational
procedures (e.g., for post-maintenance-testing.) |
Procedure is part of work package
Maintained on file
Used on job site as reference,
especially if it contains check-off lists or data
collection requirements
Includes post-repair tests |
During the conference call, I emphasized the need for “trust” in
the client-contractor partnership that extended to the conduct
of operations and, in this case, maintenance. I recommended the
audit requirement be abandoned completely and that the
procedures be categorized per the definitions in the table above
with individual steps required to be checked off only for safety
and critical maintenance tasks.
Concept # 6 – Eliminate as much maintenance as possible and
increase emphasis on reliability.
In the past the traditional view was that the two goals stated
in the concept statement above are contradictory and impossible
to achieve. However, this is not the case. More maintenance does
not produce more reliability pre se. In fact it can be a root
cause of reduced reliability. If the organization has created
the optimum maintenance program and knows exactly what
maintenance to perform (that which is (cost) effective and
applicable (i.e., it works) – a result of a proper application
of Reliability Centered Maintenance (RCM) methodology) and
exactly how maintenance should be done (a result on
proper application of Total Productive Maintenance (TPM)
principles), then the stage is set for concentration on the
goals stated in the concept above.
A major pillar of TPM and one that is often neglected may be
stated as “Manage equipment in order to prevent maintenance.”
Much can be done at the design stage to eliminate, reduce or at
least minimize the hours spent maintaining equipment through
application of maintainability principles and choosing
components with generous service factors. However, most of the M
& R world is faced with the equipment already in place and in
production, acquired on a lowest purchase and installation cost
basis. Thus, the challenge is to improve the reliability and
maintainability of the equipment we have, not the equipment we’d
like to have. Some texts on TPM refer to this as “Corrective
Maintenance,” a term that means, in the context of TPM,
modifying the equipment in service to improve its design and by
extension its capacity to reliably produce a product or service
at lowest possible overall conversion cost.
The cost reduction from increased reliability and decreased
maintenance can be significant. It affects the overall
conversion cost of a product or service. The reduction in cost
directly affects the profit margin, and/or makes it possible for
a company to offer cost savings to customers, thus improving
competitive position in the marketplace.
Unfortunately, the management error that is often committed is
to mandate maintenance cost reduction without compensating by
providing a comparable improvement in reliability or
maintainability, both of which require labor hours on
never-ending, continuous basis. This is exactly the opposite of
what should be done. More often than not the decision maker gets
away
with it in the short term. This is because the “easiest” target
for cost reduction is most often maintenance personnel (layoffs
of “excess” personnel). Typically such action causes a pullback
from proactive maintenance and a fallback to reactive
maintenance (high priority, if not emergency repairs). This
results in a more costly approach as time goes on,
especially when lost opportunity costs are considered. The full
impact may not be felt for many months, and in some cases up to
2 years. When the percentage of inoperative equipment reaches an
intolerable point, and maintenance personnel are again augmented
so labor hours can again be devoted to proactive measures, it
takes about two more years to fully recover to the high
point of performance where the layoffs began. In fact a study
performed at Massachusetts Institute of Technology (MIT) shows
that these cyclic events do, in fact, occur.
What should be done when excess labor hour become available
after proactive maintenance practices take effect? The
nature of the jobs experienced maintenance personnel are
performing should be changed! Emphasis should be placed on
the following:
This is done by acquiring and putting in place and using:
-
Rules -
related to best practices in maintenance and
reliability
-
Tools –
acquisition application and continual updating for
maximum productivity
-
Schools -
to teach the new skills need by modern maintenance
organizations
For more on rules, tools and schools, see the discussion
under the next concept for avoiding human error in
maintenance and reliability.
A true story involving a 30 year old aluminum production
plant reflects this cyclic effect. In the mid 1990’s the
plant changed hands from foreign to U.S. owners. The new
owners hired new plant, maintenance and reliability managers
to see what they could do to improve profitability of the
aged but still reasonably profitable plant. New management’s
initial assessment showed that the throughput under the
maintenance strategy they inherited was only about 50% of
the designed-in capacity. Given that the company could sell
everything it could produce, the team set out to increase
throughput by changing the strategy to a more proactive one.
A vigorous predictive maintenance program was instituted.
Root cause analysis and reliability improvements were made
to existing equipment. Within about two years the throughout
had been increased to about 75% of projected maximum
capacity. The owners were making so much money they
purchased three more aluminum plants in a distant state. The
plant manager was promoted to vice president and his
replacement recruited from another aluminum producer, with
the promise of autonomy in his running the operation.
Within the first week, the newly hired plant manager made
his views concerning further improvement well known. He told
the predictive maintenance team that he didn’t understand
what they were doing and recommended they bid for jobs where
they had “real” tools in their hands to “fix” things.
Reliability improvement initiatives were put on “hold.”
The reliability manager quickly found a new position in
the expanding corporate office. His assistant, hired to
oversee RCM projects, found employment at a nuclear
power plant. The predictive maintenance team leader (in
a salaried position) was stuck for a time while he
finished a master’s degree program at a local
university, but he, too, left for a job managing
contract maintenance at a new steel plant.
For months, throughput remained where it was when the
new plant manager took over. But in about a year there
was a distinct downturn in production. The vice
president paid a visit to review performance and found
only the original maintenance manager from his “dream
team” still in place -- but scheduled within two weeks
to move to another company where he had accepted an
offer of a maintenance manager position.
The vice president found out from the soon-to-depart
maintenance manager what had happened, confronted the
plant manager and fired him, assuming the plant
manager’s duties in addition to his own. The maintenance
manager was promoted to be plant manager at one of the
newly acquired facilities.
The vice president has been trying to reverse the
downward trend in production ever since by building a
new team and restoring confidence in the union staff
members who remain in or returned to their hard-won,
higher paying predictive maintenance positions.
Concept # 7 - Don’t forget the roots of your M & R
program initiatives for improvement.
It is not uncommon, with so many new initiatives
being offered in the field of M & R, to see earlier,
even highly successful principles and methodologies
abandoned or forgotten with the promotion, retirement
and transfer of those who implemented them.
One of the earliest adopters of RCM methodology
developed for commercial aircraft was the U.S. Navy. In
the 1970’s and 1980’s vigorous effort were undertaken to
change maintenance from more a costly, shipyard-based
strategy to one anchored in RCM and operating base
support.
By the 1990’s, most of those engaged in implementing the
“new” RCM-based approach had retired or moved on to
other jobs, due in part to the post-Cold War draw-down
in naval forces and related support facility manning.
By the late 1990’s the Navy found that its maintenance
programs were in need of overhaul and revitalization in
order to ensure reliability in the face of apparent
return of intrusive maintenance requirements that were
superimposed on the RCM-based strategies (which differed
from class to class of ship and submarine). In addition,
while the specifications for building ships still
contained the Department of Defense mandated requirement
to provide an RCM-based maintenance program, new methods
of contracting for ships often resulted in these efforts
being under-funded and inadequately implemented. The
ship builders often simply implemented original
equipment manufacturer (OEM) recommendations, which had
been determined in the studies done decades earlier to
be heavily tilted towards regular “overhaul,” requiring
heavy life-cycle replacement parts costs. The OEMs did
what the ship builders asked and were benefiting
handsomely as a result.
Luckily for the Navy and U.S. taxpayers, some
“old-timers” still remained in civil service who had by
this time achieved positions with sufficient clout to
rectify this problem. They devised a revitalization
initiative to avoid inapplicable and ineffective
maintenance and reduce maintenance costs without
sacrificing reliability. The initiative was based on
three parallel efforts:
·
Rules
– Improving maintenance requirements and plans
(including reliability improvements)
·
Tools
– Using computer and diagnostic technology (i.e.,
Condition-based maintenance)
·
Schools
– Educating all levels of maintenance decision makers in
reliability and condition-based maintenance principles
Commercial organizations suffer from the same problems
that those in the Navy did in the 1990’s. Consumers and
promoters of keeping as many core industries in our
country as possible pay the price for this error by
humans engaged in maintenance and reliability. The error
is that they forgot (or never learned about) the past.
Conclusions:
The paper has expounded on seven (7) concepts that are
essential to minimizing error by those engaged in
Maintenance and Reliability. These are summarized below:
-
Exercise and practice leadership as well as
management.
-
Look first at “programmatic” rather than “technical”
solutions to reliability problems.
-
Look for indicators of small, seemingly
insignificant but repetitious reliability problems
and act on the findings.
-
Don’t be afraid of mistakes; learn from them.
-
Become a Procedure Based Organization, but don’t
overdo it.
-
Eliminate as much maintenance as possible and
increase emphasis on reliability
-
Don’t forget the roots of your M & R program
initiatives for improvement.
Having heard about those listed above, I’m certain those
who read this text or hear it presented can come up with
many more ideas on reducing the occurrence and impact of
human error in maintenance and reliability. However,
concentrating on these will make a big difference in
achieving the goals and objective of your organizations.
The manta for modern maintenance and reliability
programs everywhere could well be “Rules, Tools and
Schools!”
|