Ledet’s
analysis suggests that if companies focus on planning
only they will improve their uptime by 0.5%. If they
focus only on maintenance scheduling, uptime will
improve by 0.8%. If they focus on preventive and
predictive maintenance only, uptime will actually get
worse by 2.4%. If organisations focus on all of these
three aspects, they will gain a 5.1% improvement in
availability.
These results may well sound appealing in their own
right, but subsequent to the report, Ledet found that
by adding defect elimination to the initiatives
undertaken, a further 9.7% (taking the total
improvement to14.8%) improvement in availability may
be achieved in their plants. This information is
provided in the table at Figure 2.
The relationships
between the four quadrants of Reliability Assurance
and the process elements studied by Ledet are shown in
Figure 3.
This relationship suggests that improving planning and
scheduling by implementing a CMMS without a having a
focussed PM program will not generate significant
returns. Similarly, working hard at developing a
focussed PM program without a good planning and
scheduling system, will not generate significant
returns either. The suggestion is that organisations
should work on their CMMS planning and scheduling
systems and their maintenance strategy development as
well.
The other important factor is that the defect
elimination process is the process that provides the
most improvement opportunity. If this is the case,
then the intuitive approach to secure improvement
would be to focus on defect elimination first and then
work on the other elements.
This
approach will not work without a strong foundation of
preventive maintenance. This is because without good
preventive maintenance, reactive maintenance will
prevail. In reactive mode, a high percentage of the
failures will be cause by a lack of maintenance, not
inherent problems with machinery design or operating
problems. In this situation, any program to work on
defects will, in all likelihood, be unable to
determine if the failure was due to lack of
maintenance or design. In addition,
the
volume of defects to analyse will probably be too high
to cover, making the defect elimination program
exhaustive and ineffective.
The proposed starting
point therefore must be to get the fundamentals of
effective PM in place which means that the first step
in launching an Reliability Assurance program is to
review the maintenance strategy.
Some of the functions of
a Maintenance Management System include the following:
·
Administration and
Execution
·
Work orders,
·
Work history,
·
Spare parts,
·
Cost control,
·
Contractor management,
and
·
Planning and Scheduling.
There is considerable
literature available that provides information on
setting up and using a maintenance management system.
Because of this, the focus of this paper is not in
this area. There are, however, some factors about the
implementation of maintenance management systems that
make effective Reliability Assurance difficult to
accomplish. These factors are discussed in the
following paragraphs.
One of the problems in
getting a Reliability Assurance program established is
that many people in organisations believe that the
CMMS can do more than it is capable. To put things
bluntly, few, if any, CMMS systems have the data
fields and data relationships necessary to provide
information in the manner required for maintenance
strategy development, plant performance management,
and incident management. The realisation that all of
these systems are distinctly different yet need to be
integrated is very important. The purchase and use of
a CMMS alone will leave a large void in Reliability
Assurance infrastructure.
To be effective at
Reliability Assurance, an organisation needs systems,
software and work processes that deal with the other
thee quadrants.
Many CMMS’s are poorly used and not set up properly.
This may be for many reasons. One of them is that in
setting up a CMMS, the organisation has seen it as a
tool to manage maintenance administration alone. They
have not understood the concepts of Reliability
Assurance. In computerising their systems they have
failed to realise that setting up a system to
administer maintenance is not going to work well if
the underlying programs are not well defined or do not
add value. The startling result of most PMO2000™
analyses completed over an eight year period is that
barely fifty percent of the maintenance strategies
contained in CMMS remain unchanged after review. It
is common to delete 15% of the maintenance as it adds
no value and it is common to add the same amount to
manage failures that are preventable but have no PM.
The remainder of the changes become interval
extensions or reductions or moves from time based
overhaul maintenance to condition based maintenance or
vice versa.
Trying to plan and schedule a PM program that is only
50% effective can not be good management.
Reliability incidents can be defined as failures of
plant and equipment that lead to any kind of loss, or
increased risk to the business. In capital-intensive
industries, the categories vary in terms of exposure
and likelihood. Typically however, they include the
following categories:
·
Threat to safe operation,
·
Threat to the environment,
·
Threat to the commercial viability of the company,
·
Loss of customer satisfaction,
·
Loss of production or failure to complete the mission,
·
Breach of security, and
·
High repair cost.
The process of incident management is to identify and
resolve plant or human failures that result in greater
exposure or the loss of any of the above.
Ideally, organisations would take steps to remove such
risks before they occur, however in practice,
predicting every risk and reducing each of them to
acceptable levels is a very difficult thing to do.
Many organisations have found that they can create a
focussed maintenance strategy for all equipment
(critical and non critical) within 12 months by taking
a review and rationalisation approach. The problem
with most maintenance strategy development activities
is that information is never perfect and assumptions
are made. This means that the maintenance strategy is
a living program which needs incident management to
keep it current and developing as better information
comes to hand.
In addition to incorrect assumptions, there are other
factors that could cause unexpected equipment
failure. Some of these factors are as follows:
·
Temporary repairs installed and not removed,
·
Maintenance error caused by poor training or lack of
adherence to procedures,
·
Maintenance not being done on time,
·
Incorrect operation of equipment, and
·
Faulty parts installed.
The process undertaken to review reliability incidents
is relatively simple and quite common place. It
follows a typical investigation cycle found in many
problem solving techniques. At a high level, the
generic process that we prefer to use has seven steps
which are listed below:
·
Originate
·
Allocate Analysis Responsibility
·
Analysis and Recommendations
·
Approve
·
Implement
·
Review, and
·
Close
As this approach is so well known, it is not
considered necessary to discuss each step. However,
it is worthwhile expanding on one unique aspect that
pertains to Reliability Assurance. When reviewing
equipment failure, there is a specific process
flowchart that we recommend should be followed. The
process is shown below in Figure 4.
The starting point [F] is any unexpected failure that
has occurred in the plant. The first step is to
define the failure mode or mechanism of failure.
Following this [Failure Analysed?], it needs to be
determined if this failure mode has been analysed
previously using RCM / PMO logic. If it has not [N],
then it should be put through an RCM / PMO analysis
[Apply RCM / PMO]. If it has been reviewed [Y], then
the validity of the previous review needs to be
assessed against the fact that the failure has now
occurred unexpectedly [Failure Prevented?]. The
previous analysis may have recommended a "No Scheduled
Maintenance" policy in which case, the outcome was
expected and no further action need be taken except if
the failure has now become more of a problem than
originally thought [Increasing problem?]. Then
modifications and a revision of the RCM / PMO should
be undertaken based on the decreased reliability.

If, however, the recommendation was for PM and the PM
has failed [System Downfall], then the
source of the problem needs to be identified and
rectification action taken.
Clearly, to undertake this work, the organisation
needs to have an efficient means of retrieving the
maintenance strategy for any given failure mode. Once
again, the need to conduct either RCM or PMO2000™
before deploying an incident management system is
shown.
Data collection in the maintenance environment can
take many forms. It is important to collect data
about plant condition and what maintenance work has
been done. However, that data is not the data most
needed for Reliability Assurance. The data required
for Reliability Assurance is data relating to
equipment failure and the circumstances surrounding
that failure. There needs to be a clear distinction
between these different types of data when setting up
a strategic data collection system.
Even though the data required for Reliability
Assurance varies between sites, the following
generalisations apply to the vast majority of cases.
Machines exist 24 hours of every day they are on the
company register. During this time, they may be in a
number of states. Some of these states are listed
below:
·
Having upgrades or modification,
·
Not required for production,
·
In transit or being changed to different products,
·
In production,
·
In planned maintenance, and / or
·
In breakdown maintenance after having suffered a
failure and being repaired or running at a reduced
rate.
Many companies track these states and establish a
figure that compares production time to total time.
This figure, when de-rated with quality and throughput
losses is often called Asset Utilisation or
Total Effective Equipment Productivity.
This paper is primarily
concerned with machine reliability and is therefore
concerned only with the latter two points on the above
list. It should be noted that this paper is
restricted to analysis of evident failures
as hidden failures by definition do not of themselves
cause operational loss.
These two reliability elements can be expanded as
follows:
·
Planned Maintenance
o
Preventive Maintenance, or
o
Corrective Maintenance
·
Breakdown Maintenance
o
Expected Failure - Equipment breakdowns that have been
assessed as “No Scheduled Maintenance”, or
o
Unexpected Failure - Equipment breakdowns that should
have been predicted or prevented.
Inherent in these elements are some concepts that need
to be understood clearly in order that the Reliability
Assurance approach makes sense. These concepts are
explained with the assistance of the models shown in
Figures 5 and 6.
The reliability and performance of any machine is
determined by two factors. These are as follows:
·
The way the machine was designed, and
·
The way it is operated.

Failure characteristics and economics are such that
for some failures, the defined maintenance strategy is
“No Scheduled Maintenance” (NSM). This may be because
of the two scenarios described below:
·
the failure is random, and the PF interval is too
short to be of any use, or
·
because the cost of prevention is more than the costs
of the failure.
This reality means that
there will be a certain level of unavailability
inherent in the design and operating conditions.
Failure modes which have NSM strategies will
inevitably become breakdowns and result in capability
loss. We call such failure modes Expected Failures.
This is because over the life of the asset, it is
expected that such failures will occur and result in
loss of production.
While some failures or breakdowns will be accepted
as being inevitable, others will be prevented either
through condition monitoring or fixed time
replacement. Where these preventive actions require
that the plant is taken off line, then the
preventive maintenance is another loss that is
inherent.
In addition, condition monitoring may detect the
onset of failure. The rectification action taken in
such cases may require the plant to be taken off
line.
All of these losses combine to form the Inherent
Capability Loss shown in Figure 5. The Inherent
Performance Level is therefore the total time less
the Inherent Capability Loss.
If the PMO/RCM maintenance analysis was done
correctly, and the machine is
maintained and operated according to the approved
process, then it should suffer no unexpected
failures. This is not to say that the plant will
not fail, what it means is that all the failures the
plant experiences will be expected. The reality in
most organisations is that some failure modes that
receive PM will fail unexpectedly which means that
some failure modes that have preventive maintenance
activity, occur during production.
These losses are shown in Figure 6.
The steps suggested are discussed in the following
paragraphs.
Setting up a generic data collection system is not
difficult in theory. In practice, data collection
usually involves people in the collection of the
data, input into computers, and its use. The
following is a list of important factors that should
be considered early in the development of the data
collection system.
·
Data collection systems often have a large number of
interested parties or stakeholders. For this
reason, data collection strategies should not be
created by a single person with only one agenda in
mind.
·
Data is often collected by people who operate
machines. The degree of literacy and numeracy
should be assessed and considered. Avoiding written
notes by using codes is a good idea.
·
Consider what types of codes are applicable as the
reporting loss codes determine the reports that can
be generated.
·
In most cases, someone has to enter the data into a
database. It is important to minimise the effort
involved in keying in information.
·
In some cases it is important to collect data about
rate variance and quality loss. If these are
important, then there should be plans put in place
to account for these losses. Sometimes this data
can be difficult to obtain accurately however, it is
often worth making some assumptions and implementing
something rather than waiting for the perfect
solution to be found.
·
The system should be reconcilable. This means that
the actual output plus the losses should amount to
the standard rate multiplied by the time in
production.
·
Information about plant failure is best input after
the fault has been corrected. The system should be
such that codes are not entered at the time the
fault occurs. They should be entered after the
rectification is complete.
Establishing the current performance of an asset can
be done when the data collection system is
implemented. It may take some time to generate
sufficient data to understand the average
performance levels as there could be quite a bit of
variation over time.
The establishment of the inherent performance level
assumes that RCM or PMO2000™ has been undertaken.
The process commonly used to determine inherent
performance level is to collect all the loss data
from a recent period long enough to be valid and
consider what failure modes are treated with PM and
which ones will be left to repair when they fail.
By making the assumption that the failure modes that
now have the PM done at the correct interval will
have been planned maintenance activities rather than
breakdowns, then it is possible to predict what the
performance would have been under the new
maintenance strategy.
From the data gathered, or from discussions with
people close to the plant, determine what the main
causes of downtime are and attempt to quantify them
Conduct workshops to establish causal relationships
and create an improvement plan. It is highly
recommended that the workshops involve the people
who operate the plant and collect the data as this
will build a sense of ownership in the improvement
plan.
In this step, formal RCA workshops can be used.
Once the improvement plan is created, the categories
for improvement need to be reviewed to ensure that
it can be determined whether the improvement plan is
working.
The data collected should indicate if the
improvement strategy is working. If the strategy is
not working then a new one needs to be created or
the problem needs to be listed as inherent in the
system.
The most common problems organisations have with
incident management systems are as follows:
·
The
foundation of PMO work has not been done.
·
There
is no-one in the
organisation
that is responsible for administering the system.
·
The
system is fragmented and cumbersome. Reports get
lost and nobody is keeping track of things.
·
Getting information is time consuming. CMMS, OEE or
PMO systems are not integrated.
·
Too
many incidents are being investigated at once.
·
Data
collection strategy is too generic
and
lacks definition.
o
There
is little knowledge of what problems are being
looked at and for what reasons.
·
The
systems are too cumbersome to get data into and get
data out.
·
Too
much data is being collected.
·
There
is more than one system collecting the same data.
This
o
Frustrates the people who collect the data, and
o
Leads
to arguments about the data rather than a focus on
solutions.
·
No-one
tracks whether the improvements worked or not… they
are not integrated with the OEE system
·
The
people that collect the data are not involved in
using it to solve problems.
o
This
results in poor data quality.
·
The
data is collected at the wrong level and can not be
interrogated according to the necessary parameters.
For example in a manufacturing plant, data may be
collected against the line and so investigations
comparing performance of different products can not
be easily done.
·
Manager’s belief that their CMMS systems can be
configured to perform Reliability Assurance.
References:
Johnson L.P
(1995) "Improving Equipment Reliability and Plant
Efficiency through PM Optimisation at Kewaunee
Nuclear Power Plant" SMRP 3rd Annual
Conference, Chicago Illinois.
Moubray J M
(1997) “Reliability – centred Maintenance”.
Butterworth - Heinemann, Oxford
Nowlan F S and Heap H
(1978) “Reliability – centred Maintenance”.
National Technical Information Service, US
Department of Commerce, Springfield, Virginia.
Ledet W
(1994) “Rational Considerations – Systems Dynamics
Model (The Manufacturing Game) “ Goal/QPC
Conference Boston, MA, USA November 1994.
Turner S J (2001)
“PM Optimisation – Maintenance Analysis of the
Future” ICOMS Annual Conference Melbourne 2001
Footnotes