|
RCM, the Navy Way for Optimal Submarine Operations
By Timothy Allen, CMRP
Granite Reliability Group, LLC
Note: This paper was originally delivered at
RCM-2006 -
The Reliability Centered Maintenance Managers Forum in Las
Vegas
Tragedy at Sea
On January 8, 2005 the USS San Francisco was transiting the open
waters of the Pacific when she experienced arguably the heaviest
blow to befall a US Navy submarine in decades. Unbeknownst to
the ship’s crew an undersea mountain lay directly in her path.
Suddenly the ship slammed bow first into the obstruction at 30
or more knots - severely damaging the vessel and injuring most
all aboard. Tragically, a US sailor lost his life as a result of
the accident. More lives could have been lost however. The bow
of the ship lost almost all ballast tanks, significantly
degrading buoyancy and the ability to recover from 500 feet
below the ocean’s surface. However, due to the heroics of her
crew, and the ruggedness of the vessel, the boat did recover
from the head-on collision. That this boat did return from the
depths is somewhat of a miracle and it should reflect well on
all those that support submarine design, construction and
operations. One should also consider the Navy’s maintenance and
reliability strategies over the past years to preserve and
enhance system integrity. No one single factor saved the San
Francisco, and as people in all industries study risk, as it is
associated with their plants or platforms, the USS San Francisco
serves as an example of what can go wrong.
Maintenance Planning
In 1995 the Submarine Maintenance Engineering, Planning and
Procurement Activity (SUBMEPP) – a Kittery, Maine field activity
for Naval Sea Systems Command (NAVSEA) - began classical RCM
analysis for non-nuclear submarine systems and equipment.
SUBMEPP, as NAVSEA’s technical agent for submarine non-nuclear
life cycle maintenance planning, provides maintenance products
and engineering services to the fleet. In the decade that has
followed, much has been learned and discovered through the
application of Reliability-Centered Maintenance. This paper
shall focus on some of the surprising results for the US Navy
realized through the utilization of RCM, and shall also reflect
on the importance and proper application of RCM for all
industries. I will also make some observations based on my
tenure as SUBMEPP’s RCM program manager from 2001 to 2005.
The US Navy has had great success conducting RCM, and one
testimony to that is that in an era of shrinking budgets and
tight funds they are still doing it - more than ever. As you may
already know, RCM is a technical discipline to preserve required
system functions, and safe plant or platform operations, at
minimum cost. In the case of the USS San Francisco, the systems
necessary to allow recovery of the ship worked. Backup systems
worked as well, not only blowing emergency high pressure air,
but also lower pressure air continuously into the remaining
ballast tanks to raise and keep afloat the severely damaged
vessel.
Stanley Nowlan and Howard Heap, under tasking for the Department
of Defense, wrote the landmark publication entitled
“Reliability-Centered Maintenance” in 1978. One of their quotes
resounds as we reflect on the San Francisco. “Not every critical
failure results in an accident; some such failures, in fact have
occurred fairly often with no serious consequences. However, the
issue is not whether such consequences are inevitable, but
whether they are possible”.
Their quote was added as emphasis to SUBMEPP’s RCM handbook this
past year. Also, renewed emphasis was placed on emergency
equipment by adding to the handbook: “For protective devise,
emergent use or safety equipment failures, the situation for
which the protective device, emergent use or safety related
equipment was intended to operate must be assumed. The effects
of this worst case situation should be judged.” So, if one
were designing or maintaining a system that blows air into
submarine ballast tanks to ascend the ship, one must consider
such emergency situations.
Another emphasis was added to SUBMEPP’s RCM handbook and that
is: “Traditional RCM methodologies generally do not consider
the combination of double evident failures as having safety
effects, because most platforms, systems etc. can be secured
either immediately or within hours to prevent or mitigate second
failure consequences. Naval ship’s, however are not as
fortunate. Operational ships cannot immediately obtain the
security of port or dry dock. Moreover, ship’s crew cannot
repair all failures. Therefore analysts must consider whether
the redundant, backup or complementary component(s) will be
available during the period of failure.” Simply put, the
submarine process asks the analyst to consider whether they have
a high degree of confidence that in case of failure, will the
redundant, or backup system that is intended to mitigate the
effects, be available. Items of consideration are whether the
mitigating system or equipment is subject to hidden failure, is
inherently unreliable, or has extended repair times.
Criticality
SUBMEPP developed a criticality matrix to establish what level
of risk is tolerable and what risk should be mitigated. Failure
Modes and Effects Analysis is essentially a risk assessment.
Specifically, those failure modes that are determined to be
critical, shall be analyzed for the development of planned
maintenance. Criticality is the combined influence of the
severity of failure and its probability of occurrence. For each
failure mode, one of four severity levels is chosen:
Catastrophic (severe personal injury or loss of ship), Mission
(the loss or curtailment of ship’s mission), Marginal (a failure
that may impair system operation, but not cause loss of mission)
and Minor (a failure that does not have a significant effect).
It is noteworthy to mention that realistic catastrophic or
mission failures are deemed critical regardless of their
probability. And minor failures are deemed critical if they are
expected to occur at a rate of three years or more often because
they are considered a maintenance burden. Non-critical failures
are tolerated. Critical failures are evaluated to determine if
maintenance can prevent them.
Each critical failure mode is evaluated to determine applicable
and effective maintenance requirements. The criteria for
maintenance are dependent on the severity of failure and the
evidency of failure. If the failure is safety related,
preventive maintenance is required and must reduce the risk of
failure to an acceptable level (essentially zero chance of
occurrence). If the failure is mission critical, preventive
maintenance is desired if cost effective relative to the cost of
mission loss. If the failure is marginal, then the evaluation is
more of a business case - the cost of maintenance must be less
than the costs associated with not accomplishing it. If the
failure is hidden and safety related, then maintenance is
required to reduce the risk of multiple failures to an
acceptable level. Once these criteria are established, the
analyst will evaluate whether servicing tasks, condition
directed tasks, time directed tasks or failure finding tasks are
applicable and effective in preventing failure.
Applicability and Effectiveness of Time Directed Tasks
In 1961 a joint task force consisting of FAA (Federal Aviation
Administration) and US airline company representatives reported
its findings on the effect of scheduled maintenance and aircraft
reliability. They stated “In the past, a great deal of emphasis
has been placed on the control of overhaul periods to provide a
satisfactory level of reliability. After careful study, the
Committee is convinced that reliability and overhaul time
control are not necessarily directly associated topics.” Further
studies that also supported this precept and efforts to
determine just what does maintain reliability, led to a new
discipline which eventually became known as
“Reliability-Centered Maintenance” – a set of principles and
methodology to objectively determine the appropriate type and
level of maintenance to maintain required asset functionality.
Inherent to most RCM seminars is the presentation of the Age and
Reliability patterns displayed in figure 1.

Figure 1. Age and Reliability Pattern Categories
The graphs depict equipment failure rates (y-axis) vs. service
time (x-axis). These curves and the associated population
percentile applicabilities have helped dispel the long held
notion that equipment reliability fits the so-called “bathtub
curve”. The bathtub curve theory, which postulates that
equipment suffers higher than normal rates of failure early in
its life (infant mortality), followed by lower and steady rates
of failure for a time period, with an eventual wear out age at
some defined time period, represents only 3-4% of sampled
equipment populations according to three studies accomplished by
United Airlines, Broberg (1973) and the U.S. Navy (1982 MSP).
While the majority of sampled equipment populations did
experience infant mortality, in general, 90% of the population
did not experience an identifiable wear out period. The Navy
results are an exception to this generalization. 20% of the
Navy population did experience an identifiable wear out period.
This has been attributed in part to the corrosive marine
environment that affected many of the sample population. Also
noteworthy was the finding that the population majority in the
Navy study did not suffer infant mortality. This has been
attributed to the fact that navy vessels, systems and components
are thoroughly tested and “run in” prior to being put into
service. Infant mortality certainly exists, but many instances
of it are not on the “radar screen”. While no one should accept
these findings at face value without reviewing them in the
context of each individual study, these curves have been used to
demonstrate the precept voiced back in 1961 – that random
failure predominates.
In 1998, SUBMEPP developed the capability to generate Age and
Reliability profiles utilizing maintenance data imported from
the Navy’s 3-M OARS (Maintenance and Material Management Open
Architecture Retrieval System). This provided the organization a
new means to objectively measure the effects of planned
maintenance to engineer optimal maintenance plans. In 2001,
after three years of generating Age and Reliability profiles,
SUBMEPP reported that the 1961 finding holds true. In the
majority of cases there was no relationship between overhaul
time and reliability. Random failure predominated.
In SUBMEPP’s study, Age and Reliability graphs were generated
for fifty-two submarine component types. These components were
as complex as communications equipment, refrigeration plants,
turbine generators and towed array handling equipment. Simple,
but vital components were analyzed as well such as hull and
backup valves, gas regulating valves, steam isolation valves and
ship’s whistle. Air dehydrators, switchboards, circuit
breakers, hatches, compressors, pumps, condensers, motor
generators, torpedo tubes, atmosphere control equipment, and
propulsion shaft bearings are all examples of the type of
equipment that comprised the study’s fifty-two component
sample.
71% of the components profiled by SUBMEPP experienced a steady
state of random failure after their early years of operation.
Some of the components in this group did experience infant
mortality or short-lived increases in their rates of failure.
This compared generally well with the UAL (89%), Broberg (92%)
and MSP (77%) studies. As mentioned previously, UAL and Broberg
were based on aircraft. MSP and SUBMEPP were based on navy
vessel components and so it is logical that SUBMEPP’s results
parallel MSP much closer than UAL and Broberg.
SUBMEPP’s age and reliability characteristic findings are
categorized in figure 2 based on sample population proportions.
Only 12% of the sample supported the traditional belief that
equipment operates at a steady state of reliability and then
wears out at an identifiable time period. The remaining 17% that
demonstrated age related wear out did so at an increasing but
steady rate over their life span.

The differences between characteristics B and C may possibly be
explained by the complexity of the component. The simpler the
component and the fewer failure modes attributed to it, the more
likely that sudden wear out occurs, if indeed there is an age
and reliability relationship. Interestingly enough, all of the
components in the sample that exhibited characteristic B were
either valves or valve like in function. There was one component
that matched characteristic A and, being an electro-mechanical
device with numerous valves, it suffered predominately
electrical type failures in its early years and predominately
valve related failures in its later years.
Characteristic C components tended to be more complex then
characteristic B. Complex components have multiple modes of
failure and those individual modes may fit characteristic B when
viewed in isolation. However wear out patterns among these
individual modes tend to occur at different times and when
viewed in the aggregate, the overall failure rate pattern
matches characteristic C.
Ideally, life renewal tasks are prescribed when a characteristic
B situation occurs - just prior to the upswing in the
probability of failure. Life renewal tasks might still be
applicable and effective in a characteristic C situation. If,
for instance, it is demonstrated that a failure rate beyond a
certain percentage is undesirable, a maintenance task at that
point should return the failure rate to that found at the x-axis
origin.
8% of SUBMEPP’s sample population exhibited infant mortality
characteristics. This differs significantly with the earlier
findings of UAL and Broberg. As mentioned previously, navy
vessels go through a lengthy test period prior to entering
service. Infant mortality likely exists however those failures
are not captured in 3-M OARS during those test periods.
SUBMEPP’s infant mortality statistics differ from MSP as well.
32% of MSP’s sample suffered from infant mortality. Differences
may be caused by the type of equipment analyzed. The majority of
SUBMEPP’s components fitting characteristics A and F were more
electrical in nature, than mechanical. Electrical devices are
more prone to sudden failure early in their life. The majority
of components in SUBMEPP’s sample were mechanical in nature,
however, and that may differ from MSP and the other studies.
Platform differences may contribute as well. SUBMEPP’s results
are derived from a sample of submarine components and MSP’s
results were derived from a sample of surface ship components.
Corrective maintenance accomplished during a submarine overhaul
is not captured by 3-M OARS. Not until the boat is delivered to
the fleet is corrective maintenance reported to 3-M OARS. Jack
Nicholas, who pioneered RCM at NAVSEA back in the 1970’s, has
written about these differences as well and he believes that the
submarine community’s detailed overhaul procedures have
mitigated the occurrences of infant mortality.
Maintenance Plan Changes
The majority of components analyzed by SUBMEPP did not
demonstrate an age and reliability relationship and
consequently, many existing time directed component overhauls
have been deleted from class maintenance plans. These deletions
have allowed the Navy a substantial cost avoidance for submarine
depot availabilities. The term avoidance is used here because
one can not project beyond the age span of study to predict
future probabilities of failure. Components may or may not
experience failure rate increases and that will be a future
determination when maintenance strategies for these components
are revisited. SUBMEPP’s review of components does shed light on
the effectiveness of many overhaul periodicity extensions made
in the early 1980’s however. The majority of components that fit
non-wear out characteristics D, E and F once had overhaul
periodicities half as long.
The RCM approach is to extend or eliminate overhaul
periodicities in the absence of an age and reliability
relationship. The decision whether to extend the periodicity or
delete the action entirely often depends on the consequences of
failure. Extensions are more appropriate for components with
safety related failures for which no effective condition
monitoring techniques have been devised. Deletions are more
appropriate for non-safety related components. Maintenance plan
strategies should not be based entirely on failure rates viewed
at the equipment level. Individual failure modes should be
viewed in isolation as well to determine if an age and
reliability relationship exists. If so, a surgical maintenance
approach may be appropriate where only a piece part or
subassembly is replaced.
The portion of components analyzed by SUBMEPP, that did
demonstrate an age and reliability relationship, was further
analyzed to determine if a time directed maintenance task was
appropriate. For non-severe failures, where there are no
additional costs attributed to failure beyond material and labor
to repair the component, a fix-when-fail strategy may still be
more cost effective. Labor and overhead cost differences must be
taken into account. And if there are mission or collateral
damage costs associated with failure, condition monitoring can
sometimes be substituted for a time directed task. Condition
monitoring must detect potential failure conditions and allow a
known and sufficient time period for adequate correction. A
more surgical maintenance strategy may be appropriate as well.
Pareto’s rule that 80% of the problems are generally caused by
20% of the actors, has been validated by RCM analyses.
Maintenance professionals should concentrate on the few “bad
actors” which degrade reliability.
Case Study
Soon after the development of the feedback data analysis
application, a SUBMEPP combat systems engineer analyzed
Trident class torpedo tubes. Torpedo tubes are comprised of
barrels, breech and muzzle doors, latches, linkages, slide
valves, rotary actuators, power cylinders, safety interlocks,
indicators and numerous other sub-assemblies. The class
maintenance plan for the torpedo tubes included a time based
maintenance action to replace hydraulic power cylinders every
160 months. Each torpedo tube has five cylinders. Functional
failures for these components are mission critical as they
render a tube inoperable, or degrade performance to an
unacceptable level. Even though there are multiple torpedo
tubes, a full complement of operational torpedo tubes is deemed
necessary for readiness. Two of the power cylinders operate the
torpedo tube slide valve. Over half of the observed discrepant
conditions associated to inoperability of the slide valve were
attributed to the hydraulic power cylinders and only one of
those discrepancies was judged to be a functional failure. The
predominant mode of failure was external leakage of hydraulic
fluid and as previously stated, these were judged to be
non-functional failures. They were potential functional failures
if left untreated. Figure 3 displays the Age and Reliability
curve for the slide valve power cylinders. The failure pattern
is random with no correlation with time. In fact, the regression
line has a slightly negative slope of 0.0003X. There is no
evidence indicating that the valves should be replaced at 160
months. Moreover, the engineer found that existing condition
monitoring tasks were applicable in monitoring and maintaining
system health. Periodic pressure and cycle time tests are able
to detect degradation before performance is compromised, and
allow sufficient time for repair or replacement of cylinders.
Age and reliability findings for the remaining power cylinders
were similar. The engineer deleted the requirement to replace
torpedo tube power cylinders at 160 months and this lifecycle
cost avoidance for Trident class submarines was
determined to be $2.3 million. If the current reliability trend
holds consistent over the submarine lifecycle, that avoidance
will be actual savings.

Optimization
The US Navy has found condition monitoring with specific
condition directed tasks much more cost effective and a better
strategy for maintaining fleet readiness. Using the principles
of RCM, the Navy has been able to extend overhaul periods, and
significantly reduce the amount planned work during overhaul
availabilities. In 1994, US Navy attack submarines spent 22% of
their lifecycle in depot, being repaired or overhauled. In 1993,
that percentage was reduced to 17%. Today, attack submarines
spend only 12% of their lifecycle constrained to depot which has
allowed the US Navy to maintain the same levels of presence and
readiness with fewer ships. Optimization, however, requires
periodic monitoring of maintenance plan effectiveness, and this
is generally achieved through the review of material condition
and equipment failure data in the context of an RCM analysis.
Sometimes maintenance strategies must be altered when reviewed
against the course of time.
RCM as a Knowledge System
Finally, RCM is much more than a means to develop maintenance
plans. It is a knowledge system that allows continuous
refinement of planned maintenance strategies. Why a maintenance
task was eliminated, created, or its frequency extended will be
documented in an RCM analysis. During the industrial revolution,
great knowledge was gained through trial and error, and machines
and mechanisms were developed, refined and improved as time
passed. This process was for the most part continuously
progressive, with minimal regression regarding performance,
durability or reliability of those mechanisms. One of the chief
reasons is that the knowledge was retained in the hardware
itself. One could see it, touch it, and if someone was
completely unfamiliar with it, given enough time, much of the
knowledge that went into making it could be reverse engineered.
Gains in productivity were essentially cast in stone with each
design improvement. As highly developed nations move to less of
a manufacturing base, and more to service and knowledge worker
economies, there is great potential to regress as workers in
such industries change jobs. Their work and knowledge is
usually not captured and institutionalized like a machine
maker’s is. How an asset works and operates is preserved in the
physics of the hardware. How it is produced and properly
maintained is much more perishable. Therein lies a problem, and
the necessity to build knowledge systems. RCM analysis - a top
down logical progression, organized by plants, platforms,
systems and components - provides an opportunity to capture
critical facts, assumptions and reasonings. Further augmenting
the process and producing repeatable maintenance procedures
increases effectiveness even more. A healthy RCM program will
yield a corporate technical library for the benefit of future
workers, and assist in the retention of corporate knowledge to
allow gains in productivity. Treat RCM as a long term living
program. True optimization requires a periodic look in the
rearview mirror.
References
Excerpts of this paper are an update of a paper written by the
author while employed by the US Navy in 2001, “U.S. Navy
Analysis of Submarine Maintenance Data and the Development of
Age and Reliability Profiles” and published by the American
Society of Naval Engineers.
American Management Systems, Inc., “Age Reliability Analysis
Prototype Study”, N00024-92-C-4160, November, 1993.
Drew, C., “Adrift 500 Feet Under the Sea, A Minute was and
Eternity”, New York Times, May 18, 2005.
Michal, J. “Reliability Modeling and Estimation Using U.S. Navy
3M Maintenance Data, Naval Postgraduate School Monterey,
California, September, 1995.
Nowlan, F. and H. Heap, “Reliability Centered Maintenance”, MDA
903-75-C-0349, December, 1978.
Biography
Tim Allen is the senior member of Granite Reliability Group,
LLC. Tim is the former RCM Program Manager at Submarine
Maintenance Engineering, Planning and Procurement (SUBMEPP), a
Naval Sea Systems Command (NAVSEA) field activity located in
Kittery, Maine. SUBMEPP is the engineering and planning
authority for submarine lifecycle maintenance. Tim worked at
SUBMEPP for twenty years and has been instumental in Navy
RCM since 1996. Tim represented NAVSEA's submarine RCM program,
trained system engineers in the principles and methodologies of
RCM and worked collaboratively with them to engineer cost
effective submarine class maintenance plans for all Navy
submarines. Tim received a Bachelor of Science in Mechanical
Engineering Technology at the University of Maine in 1986. In
1997, he received a Master of Business Administration degree at
New Hampshire College.
Timothy Allen
Granite Reliability Group, LLC
This paper was originally delivered at
RCM-2006 -
The Reliability Centered Maintenance Managers Forum in Las
Vegas
Full RCM-2006 Proceedings are available on CD here |