|
Considerations for Planning and Scheduling Part 2
Reactive Maintenance Planning
by
Howard W Penrose, Ph.D., CMRP
President, SUCCESS by DESIGN
Introduction
In the first paper we
discussed the Workflow Concept (WFC) and Design for Maintenance
(DFM) processes for improved accuracy in planning and scheduling
planned maintenance tasks. The other type of maintenance that
occurs, regardless of the type of maintenance performed, is
reactive maintenance as the result of random failure. All
systems have the chance of failing unexpectedly, so methods must
be in place to handle these situations in order to have the
least impact on the planning and scheduling process.
Where programs are advancing through towards higher levels of
maintenance, random equipment failure can be an opportunity for
maintenance. However, the reaction is often to over-react and
over-provide resources to the problem at hand. The challenge is
that too few resources, or too many resources, will both have
the same negative impact on solving the reactive issue. The
opportunity can be outlined in a reactive maintenance plan for
specific equipment in which there is a method of fault
identification, fault rectification, root-cause-analysis at
whatever level is appropriate, and planned maintenance to be
performed when the machine or system is idle.
In this paper, we will provide an overview of how the reactive
maintenance plan can be developed. In Part 3, we will discuss
how to blend the Reactive Maintenance Plan with the Planned
Maintenance program.
The Reactive Maintenance Process
Once a system or component ceases to perform the function
required by the owner, the equipment is considered failed. At
this point, the random fault has occurred with an urgency based
upon the criticality of the equipment. The correct process to
address the failure is as follows:
-
Fault
Identification: At this point, discovery of the fault
occurs, the failure is controlled and troubleshooting is
performed;
-
Fault
Rectification: This is the repair or replacement of the
failure;
-
Root-Cause-Analysis: Using the evidence and findings of the
fault and fault rectification, an RCA should be performed.
The depth of the RCA should directly relate to the
criticality. For instance, for a minor failure that is not
repetitive or does not meet a pre-set value, a simple 5-Why
process may be followed. For a critical failure, or one
that exceeds a pre-set value, a more rigorous process should
be followed.
-
Additional planned maintenance that can be performed on the
faulted equipment should be considered a possibility. This
may include additional testing to detect latent problems.
In order to ensure that these steps are performed as effectively
as possible, a written process must be developed.
Fault Identification
Troubleshooting equipment or system failure can be time
consuming and dependant upon the skills and knowledge the
maintenance first responders. Knowing the available skills, as
outlined in Part 1, the maintenance planner can select the
appropriate first-responders. The challenge is then left to
troubleshooting and the correct selection of inspection and
technology techniques.
The tools that can be used to determine the appropriate
troubleshooting and inspection techniques include the results
from Reliability-Centered Maintenance (the Failure Modes and
Effects Analysis), a Failure Modes, Effects and Criticality
Analysis (FMECA), the manufacturers’ manuals, historical data,
knowledge capture and/or other processes such as
Root-Cause-Analysis studies. The results of each of these
opportunities should be put in the form of a logic analysis or
troubleshooting chart, as shown in
Attachment
1.
The development of such a chart involves, first, a combination
of the above information as well as the instrumentation
available and the abilities of the maintenance personnel. Such
a chart provides direct troubleshooting abilities as well as
provides confirmation tests, inspections and pass/fail values.
The long-term benefit of such charts, in particular for critical
machines, is greater control over the time, effort and selection
of skills through the understanding of the length of time such
tasks should take. This information can come from the time
studies performed for preventive maintenance.
It should be noted that some failures will require efforts well
beyond the ability of these charts, which would instead provide
a guideline. However, they will reduce the time to troubleshoot
and bring a system back online very quickly, controlling the
impact of random failures as well as providing information on
the number, type and capability of personnel required.
Fault Rectification
The fault rectification process requires that repair
specifications are developed, for outsourced repair, internal
best practices/procedures for common fault repair or
replacement. The development of an overall spares
identification program combined with agreements with vendors
will also provide a level of stability and control over the
random failure.
Fault rectification information can be an extension to the
troubleshooting charts mentioned in the previous section. The
development of the process/procedures will assist in the
development of training, the selection of vendors and an early
indication when outsourcing is required.
Root-Cause-Analysis
Root-Cause-Analysis (RCA) is a critical process for Reactive
Planning. The criticality of the random failure and how
repetitive the failure is, will determine how rigorous the RCA
process should be.
Attachment
2 is an example of a simple RCA 5-Why process whose
information can be used to improve both planned maintenance
systems as well as improving the reactive maintenance process.
The selection of trigger points should be selected based upon
the severity of the failure. For instance, if the failure is
not repetitive and the impact does not exceed $100,000 or impact
delivery more than 4 hours, then a 5-Why analysis is performed.
If it exceeds those requirements, a more rigorous program
requiring greater experience is justified, such as the
application of the
PROACT
system of RCA.
Additional Planned Maintenance
As part of the Reactive Maintenance Plan, any additional planned
maintenance should be added. For instance, if a fan system
motor fails, inspection of belts, sheaves, fan bearings,
cleanliness, and other inspections can be performed. These
should be planned around the availability of the personnel
assigned to the random fault as much of the time personnel are
on location, they are idle. Random faults should be considered
an opportunity to inspect and improve availability of the system
once it comes back online.
Time Planning of Reactive Maintenance
Random faults should be considered a ‘job shop’ style process
and qualifies for the application of a Critical Path Method (CPM)
for determining how much time is required to perform the
maintenance, especially because there can be a minimum and
maximum time for each sub-task. The times for the CPM should be
obtained from the time studies performed for preventive
maintenance plus any historical times. The three times selected
for each branch of the CPM are the fastest, average and
slowest.
In this example, we will discuss a 500 horsepower electric motor
and pump application. The motor fails to start and trips
immediately. The trouble chart is reviewed and a technician
with an MCA (Motor Circuit Analysis) device is sent out as well
as a second technician to check the pump seal packing and the
alignment if the motor winding is good. The motor is checked
from the starter, following appropriate safety rules, a problem
is found, so the motor connection box is open, the connection
split and both the cable and motor are tested. In the meantime,
the second technician is checking the packing. The cable is
found bad and new cable is obtained. Once the material is
provided, both technicians are used to install the new cable and
the machine is energized. The 5-Why analysis is performed and
it is determined that a previous FMEA did not identify cable
testing as a requirement on this machine. It is determined,
however, that the test is not cost effective on its own, and it
is determined that MCA will be performed on a quarterly basis.
In a PERT chart, three numbers are shown associated with each
task. These are: The minimum time, the average time and the
maximum time. The CPM is then presented as shown in Figure 1.
The advantage of this type of chart is that it can be
hand-sketched if one does not already exist.

(Click Chart for a larger version)
While this example is very simple, it does demonstrate the
process. In fact, it now gives us three numbers associated with
the reactive fault: Minimum – 95 minutes; Average – 180 Minutes;
and, Maximum - 290 Minutes. We also know that the number of
personnel required for this project is two.
The numbers for the PERT chart can be obtained from time studies
performed for predictive maintenance. Additional times, such as
disconnecting both sides of the cable and cable installation
will require separate time studies performed in the same manner
as Part 1.
Conclusion
While not as precise as the planned maintenance portion of
planning and scheduling, the reactive maintenance process can be
brought under some level of control. This is done through the
development of a reactive maintenance plan which encompasses
fault identification, fault rectification, root-cause-analysis
and the performance of other planned maintenance practices on
the faulted equipment. Such a plan allows the planner/scheduler
and management to estimate the time on task, assign the correct
and right number of personnel, confirm availability of parts and
determine if steps can be performed to avoid future serious or
repetitive problems. In the next paper, we will discuss how to
combine the planned and reactive planning components followed by
a paper on developing your maintenance budget around this
process.
About the Author
Howard W Penrose, Ph.D., CMRP, is the President of SUCCESS by
DESIGN Reliability Services. SUCCESS by DESIGN specializes in
corporate maintenance program development, motor management
programs and maintenance and motor diagnostics training. For
more information, or questions, see
http://www.motordoc.net,
contact
info@motordoc.net or
call 800 392-9025 (USA) or 860 577-8537 (World-Wide).
Attachment 1:
Sample
Troubleshooting Chart 3-Phase Induction Motor (20K PDF)
Attachment 2:
5-Why
Analysis Best Practice (50K PDF)
|