|
The Ups and Downs of Reliability Engineering and CMMS
Implementation at Lone Star Steel by Allen Strickland, CMRP,
Reliability Engineer, Lone Star Steel Company
Creating a structured reliability engineering department in a
facility that has never had one is challenging enough. If you
simultaneously implement a new computerized maintenance
management software (CMMS) program, the hurdles get higher. The
key to success is to have the right management support, good
communication and a clear vision of what the future should be.
This paper will discuss some of the triumphs and pitfalls that
we have encountered on our unending journey through a complex
culture change.
First, a little history about Lone Star Steel. We are a 54 year
old steel mill located in the piney woods of Northeast Texas. We
are about 120 miles east of Dallas. The original facility was
started by the United States government during World War II in
an effort to geographically diversify the nation’s steel making
and coking coal usage from the northeastern United States.
The original facility consisted of an ore mining operation, a
blast furnace and a cast iron pipe facility. With the end of
World War II, the facility became a private enterprise. The
early 1950’s saw the addition of a 4HI Steckel rolling mill for
rolling slabs into coils and the installation of two ERW pipe
mills. Over the following decades, two electric arc furnaces
were added along with heat treating, pipe finishing and
specialty tubing facilities. Lone Star Steel was the first
domestic tube and pipe producer to receive ISO 9001
certification. Even today, much of the major original equipment
is still in use after 50 plus years of service. Spare parts
sometimes have to be machined, as the original equipment
manufacturers in many cases have ceased to be in business years
ago.
From a maintenance perspective, Lone Star Steel has historically
been in a reactive maintenance mode. Over the years, we became
expert “fire fighters”, possessing the ability to fix almost
anything in rapid fashion in order to restore production. This
ability to respond to emergency situations began to earn praise
and recognition. All the while we should have been focusing on
eliminating the occurrence of the emergencies. But firefighting
became our culture.
It is what we knew how to do and do well. To do anything else
would require a major culture change within the whole
organization. And the more time we spent fighting fires, the
less time we had to try to migrate towards increasing our
preventive work. But the reality was that without this needed
culture change, we would be doomed to ever decreasing production
as the lack of preventive (PM) and predictive (PdM) maintenance
continued to take larger and larger tolls on our equipment.
Now I don’t mean to imply that we were void of preventive or
predictive maintenance. We had a fairly good lubrication program
in place, and we worked hard at keeping equipment adjusted,
tightened and cleaned. We were good at checking for wear on
mechanical and electrical components during weekly downturns if
time permitted. We had a crane crew whose only job was to
perform regular inspections and repairs to the cranes
plant-wide. We normally took three to seven day annual outages
on each department to do major repairs and modifications as well
as to install and commission capital projects. But the
percentage of our time and resources that went to PM and PdM
tasks as opposed to recovering from failures was small.
Prior to the implementation of the CMMS and Reliability
Engineering group, the structure at Lone Star Steel consisted of
seven operating departments, with each having a fairly
stand-alone maintenance organization. These stand-alone
organizations typically consisted of a maintenance
superintendent who had one or more mechanical and electrical
foremen reporting to him. The day-to-day activities of the
bargaining unit work force were then directed by the foremen.
Additionally, there were centralized machine shops, welding
shops, carpenter shops, electric shops, instrument shops,
millwright shops and fleet maintenance shops that provided
service to the entire plant. The entire plant maintenance
organization fell under one manager. The organization chart of
this group resembled the diagram shown in Figure 1 below. Note
that only three operating areas are shown instead of seven for
clarity.

Over the years, a couple of the departments’ maintenance groups
dabbled in the use of some form of CMMS, but most of the
departments did not use a CMMS. Usage of a CMMS system was not a
structured methodology, and what data was captured, while
useful, was not necessarily comprehensive. Therefore, the data
did not always tell the whole story of what was being done to
the equipment. In July of 2005, the first ever plant-wide CMMS
at Lone Star Steel was launched and the software chosen was
TabWare.
To facilitate the implementation of TabWare, an outside
consulting group was brought in to assist in training and in
constructing the original equipment lists, hierarchies and work
order system.
The actual implementation strategy was planned as follows.
First, the equipment lists and hierarchies were established in
TabWare for each department. This consisted of entering
equipment data and descriptions, assigning each piece a unique
equipment number and coding the equipment to the correct
departmental cost center. Then, the correct parent/child
relationships were established in the hierarchy between all
associated pieces of equipment. Next, we gathered all of the
existing PM documents in the departments that had them, and
began writing master preventive maintenance plans into TabWare
to mimic what PM tasks were being performed to date. In
departments or areas where no CMMS or formal system was in use,
we applied the master plans for similar pieces of equipment
located elsewhere in the plant in order to ensure that we had
basic PM coverage for the majority of our equipment. Initially,
we had approximately 300 master plans in TabWare which were
executed against thousands of pieces of equipment.
These master plans covered all frequency of PM tasks including
weekly, monthly, quarterly, semi-annual and annual. Care was
taken when we launched the PMs to keep the work load level.
We took into consideration the distribution of monthly,
quarterly and annual tasks performed on groups of equipment so
that they did not all need to be done in the same week. The
initial assumption was that all of the existing preventive
programs that were being morphed into TabWare master plans were
valid in both content and frequency of execution until proven
otherwise. In order to keep track of this assumption, all of
these initial master plans were assigned numbers beginning with
the letter “I” to denote interim. As these plans were reviewed
for content and frequency, they became validated master plans
and the “I” designation was dropped.
At the same time that the preparation for the launching of
TabWare was being performed, two new groups were formed. These
were a group of planners and schedulers and a reliability
Engineering group. The planning and scheduling group was staffed
largely by taking one maintenance foreman from each department
and making that individual the planner for that department. The
planners in turn reported to a Senior Planner who in turn
reported to the Maintenance Manager. The Reliability Engineering
group also reported directly to the Maintenance Manager.
Additionally, a TabWare administrator was designated to be the
gatekeeper of the new system. This person’s function was to
control and issue clearances to system access and function and
to help generate metrics that were queried from TabWare. The
organizational chart then resembled what is shown in Figure 2.

Additionally, to prepare for the launching of the new system, we
used various public relations tools to spread the word about the
upcoming changes. One tool was the publication of a brochure
that was handed out at meetings held with the maintenance craft
persons prior to implementation. This brochure told in plain
language what the changes were going to be and why. The
maintenance process was given the acronym of P.R.E.D.I.C.T.S..
The meaning of the acronym is shown in Figure 3.

The meetings in which these brochures were handed out allowed
the craft persons to ask questions about the new system. The
meetings were held in the respective departments’ lunchrooms and
were chaired by either a reliability engineer or the Senior
Planner. The superintendent of the maintenance group attending
the meeting was also present to reinforce the acceptance of the
new program. In retrospect, however, we should have distributed
the brochures about a week in advance of the meetings to allow
time for all questions to be thought about.
So, after several months of work, we had moved from a plant that
had no coordinated CMMS with planning/scheduling and no
Reliability Engineering department, to a plant that had both.
That was the easy part. Now the real work began. The basic tools
were in place, but the culture remained unchanged. Not every
maintenance person in the organization was gung-ho about having
a CMMS system that would document everything they did, how long
they took to do the work and spit preventive maintenance work
orders out at them like clockwork. Additionally, the CMMS could
be queried to look at cost, safety, work order backlog or just
about any metric you could imagine. But the realization was that
the effectiveness of our maintenance program had its flaws, and
not having a CMMS system in place had only kept the flaws from
being quite so exposed. And since you can’t know how to fix
something until you know how it is broken, one advantage of the
CMMS is that you begin to see how your maintenance system is
broken. You begin to get a clearer picture of what happens if
you are understaffed, or don’t have the needed repair parts or
skip a weekly downturn. As work order history was entered into
TabWare over the ensuing months, the CMMS began to be a useful
tool for the Reliability Engineering group in performing root
cause failure analysis (RCFA) and failure modes and effects
analysis (FMEA).
By comparing FMEA hypotheses against the tasks listed in the
master plans for the equipment in question, iterations could
begin on developing the exact best preventive and predictive
maintenance tasks needed for those pieces of equipment.
The CMMS system was set up so that work orders could be
generated in one of three types. First was the Emergency Work
Order (EM). These work orders were entered to document the work
performed on and capture the cost to repair a piece of equipment
that had failed. The work order was entered either as the
repairs were underway, or after the repairs were completed, with
priority given to completing the repair as quickly as possible
to allow production to resume.
Since these work orders were entered in response to a failure,
the volume of this type of work order was a measure of how
reactive the maintenance system was. The second type of work
order was the Routine Work Order (RT). These were entered upon
discovery of a piece of equipment that had symptoms of impending
failure. These symptoms might be discovered by the operator of
the equipment, a maintenance craft person performing routine
inspections, or by use of one of the predictive maintenance
tools such as vibration data collection, thermography or
ultrasound. These work orders were entered and routed through
the department’s planner. The planner’s function was to plan the
weekly downturn workload, make sure the necessary materials were
on site and coordinate with departmental maintenance leaders and
contractors as needed to get each week’s work orders completed.
The third type of work order was the Preventive Work Order (PM).
These work orders were automatically generated by the CMMS as
dictated by the master plan(s) for that equipment. The
Reliability Engineering group had set up all the master plans,
including the tasks to be performed and the frequency of
execution and assigned these to specific equipment numbers. The
CMMS then automatically generated the PMs at the required
frequency. One of the major metrics that was monitored was each
area’s PM compliance. This measured what percent of the week’s
PMs had been completed on time versus what percent had been
closed without time being charged to the work order or what
percent had become delinquent.
As with most maintenance organizations, especially those trying
to exit from being reactive in nature, there was some backlash
as the CMMS began issuing preventive maintenance work orders.
Prior to the implementation of the CMMS and Reliability
function, each maintenance department had fixed things as they
had broken or were obviously about to break, with little time
for doing preventive functions. Now these same maintenance
departments were faced with doing those same repairs along with
the addition of weekly, monthly, quarterly, semi-annual and
annual PM work orders. And to add to the frustration was the
fact that the CMMS allowed anyone to easily query to find how
efficiently and effectively all of this work was being done.
The initial Reliability Engineering group consisted of two staff
reliability engineers and one consultant engineer who was part
of the consultant group assisting in the CMMS implementation.
Additionally, we had one inspector on staff that focused
primarily on the collection of vibration data. The reliability
engineers initially focused on helping to construct the
equipment lists and hierarchies and entering PM tasks already in
place.
Once the CMMS was up and running, the Reliability Engineering
group focused on the following tasks:
•
Review and validate the interim master plans for content and
frequency
•
Create new master plans for equipment that previously had no
formal PM program
•
Begin performing root cause failure analysis (RCFA) and failure
modes and effects analysis (FMEA) on the most critical equipment
•
Perform delay analysis to focus efforts on the Top 10 delay
causes
We enlisted the help of
Reliability Center Inc. to provide on-site RCFA and FMEA
training to the reliability engineers and to a group from one of
our operating departments where most of our efforts would
initially be focused. The training went well and the group began
performing FMEA on the most critical equipment in their
department. A word of caution here: full blown RCFA and FMEA is
time consuming and requires disciplined structure. There is a
lot of data to gather, failed parts to quarantine and analyze,
personnel to interview about the failure and a lot of meeting
time required. Time was the insurmountable hurdle. We began with
twice weekly meetings to flesh-out the initial fault trees.
After six weeks, attendance began falling off due to production
and maintenance demands. Fewer and fewer team members were given
the time to keep the process going. Additionally, the consultant
reliability engineer reached a point where he had to accept a
new job opportunity. Simultaneously, one of our staff
reliability engineers left the company. This left us with one
reliability engineer and one inspector. This is the stage where
it is easiest to just forsake the whole idea of predictive
maintenance.
To Lone Star Steel’s credit, we pursued hiring a new reliability
engineer and adding an additional inspector. During the time of
being understaffed, the reliability effort suffered somewhat due
to lack of time and manpower to “run all the traps.” But the
predictive tools we were using were adding too much to the
bottom line to be put on the shelf. It had been calculated that
the weekly usage of our infrared camera alone was adding over $4
million in revenue annually by allowing us to detect potential
failures in time to allow for planned and scheduled correction
before downtime was incurred.
This entire implementation process had been begun with the
blessing of and at the directive of upper management. Once all
of the transition flux and upheaval began, what part did upper
management play? As is often the case, other issues come up in
the big picture of business that demand upper management’s
attention, and that sometimes means that the issues already on
the table either have to wait or find a way to resolve
themselves. In our case, one issue was that we were in the midst
of a record year and strong markets for our products. This
required management to place additional focus on business
expansion and development. The level of their involvement in the
CMMS/Reliability rollout was maintained though, and allowed us
to get past some of the biggest hurdles. All of the smaller
issues were left to us to figure out. This did two things.
First, it forced the different maintenance groups to fight
through the transition together, sharing successes and miseries.
Second, it kept a level of autonomy in the maintenance group by
cultivating ownership of the process. Yes, sometimes those in
the “trenches” felt as if we had very little management support.
Although a stronger upper management influence might have
quelled some of the in-fighting and initial resistance to the
new system, it would also have made every aspect seem more
mandated and shoved down the maintenance organization’s throat.
This in turn would reduce the needed buy-in by all levels.
Sometimes you just have to try to make people want to
take their medicine.
In review of our CMMS and reliability implementation, the
following points should be emphasized:
•
Training: The need for training cannot be overstated. Training
in RCFA/FMEA, vibration analysis, thermography, ultrasonics and
the CMMS system are absolutely essential. Don’t just train the
employees on the use of the CMMS to the point that they can just
meet the demands that the new system will place on them. Train
them to the point that they can use the system to gain knowledge
about their departments and proactively make improvements. This
not only promotes buy-in from those who will be participating,
but maximizes the payback of the implementation costs by giving
participants the tools and knowledge needed to make instant
contributions. In our case, we received the previously mentioned
RCFA/FMEA training, vibration data collection training, infrared
thermography training (Level II certification), and the two
staff reliability engineers passed the SMRP certifying exam and
received their Certified Maintenance and Reliability
Professional (CMRP) certifications.
•
Communication: Fully and thoroughly discuss with all interested
parties why the implementation is being done, what the scope of
the implementation will be and what the changes in the
organizational structure will be before the implementation
begins. Make sure they understand the benefits of the
implementation and how it can make their job easier, put more
profit to the bottom line, etc.. One universal truth is that
each person is most interested in how their own daily tasks will
be affected. We held pre-rollout meetings with all of our
maintenance groups and passed out brochures that described the
upcoming transition and listed the benefits that would be
achieved. Most people will not ask questions in a group session
for fear of looking “stupid.” And sometimes people do not ask
questions simply because they do not know enough yet to know
what to ask!
When you think you have adequately prepared everyone to accept a
major culture change, then you probably have only done about
half enough.
•
Management support: Management support has to be stated up front
and be visible. But this does not mean that there needs to be
frequent hand-holding. It simply means that everyone understands
and feels comfortable that if an impasse is reached, management
is open and willing to getting the issue(s) resolved. Constant
intervention by management would diminish the process of
employees taking ownership of the changes. But management must
be committed to providing the resources necessary to make the
implementation successful.
•
Don’t be afraid to alter course. This means that there might be
a dozen paths to the desired results, and the initial path taken
may not be the best. Sometimes you don’t realize this until you
are part way along. Learn from what you’ve done, and do not be
afraid to take a different course if it provides a clearer and
more effective way to reach your maintenance and reliability
goals. No one methodology will work at every company.
The process of improvement at Lone Star Steel will be ongoing
forever. Has it been easy? No. Will it ever be easy? No, but it
will get better as time goes on. We began the transition in mid
2005 as a reactive based maintenance organization. At that time,
the number of emergency (reactive) work orders entered into
TabWare was two to three times the number of preventive
(proactive) work orders entered. After 11 months of diligent
work, the number of work orders entered that prevented failures
and breakdowns exceeded the number of work orders entered to
repair failures. This trend is shown by the chart in Figure 4.
This, combined with our use of predictive tools, allowed us to
officially claim being proactive in comparison to our previous
culture and history. In the months since that historic
occurrence, we have continued to widen the gap between proactive
and reactive work order volumes. The path to this historic event
is documented by the graph in Figure 4, below. Hopefully, in the
near future, we will begin the next challenge: to go from being
a predominantly time based preventive maintenance culture to
being a heavily predictive maintenance organization.

|