Focal Points: Sponsored links

Optimize Your Maintenance Strategy With Lawson EAM

New! Join The Association For Maintenance Professionals

Belt/Sheave Alignment Laser, Custom Shim and Gaskets 

Machinery Condition Monitoring and Vibration Analysis

Infrared windows and safety products



 


Resources for Maintenance & Reliability Professionals

Navigation
Home
News
Newsletter
Knowledge Bases
Tutorials
Directory
Maintenance-Tips
Conferences
Forums
Photo Albums
Reliability Radio
Distance Learning
Benchmark
Calendar

Gadgets
Jobs

Network Links
Archives

en Español
XML/RSS Feeds Advertise

Our privacy promise: We respect your privacy and never sell or rent our subscriber lists, a fact that is certified and audited by BBBOnline.com.  Subscribing will not result in more spam! I guarantee it!  

Reliabilityweb.com announces the Top 100 list each year as a way of delivering value to our members and as a way of acknowledging the extra work that these companies put into creating a web site that contributes to the overall maintenance and reliability community.

 

 


Request an
IMC-2007
Brochure

Please join Mr. H. Paul Barringer for an exclusive and very special
one day workshop at IMC-2007

The following problems will be presented and solved:
● 
Which Reliability Tool To Use?
● 
Reliability And Data
● 
Pump Seal Failures
● 
Corrosion Problems And Inspection Data
● 
Forecast Future System Failures

IMC-2007 is held December 4-7, 2007 in Daytona Beach Florida. 

Which Reliability Tool should I use? By H. Paul Barringer, Barringer & Associates Inc. 
 
Publisher's note:  When a person has a hammer - everything looks like a nail.  Once a maintenance engineer learns techniques like Reliability Centered Maintenance (RCM) or Weibull analysis, it seems like they apply the technique to every potential area of failure they can find - whether RCM or Weibull analysis can add value or not.  Reliability tools must be used in the proper context to create the best result and the more tools we understand the better we can apply them.
 
We asked our favorite reliability guru, Mr. H. Paul Barringer to help us understand what reliability tools are available to us as maintenance professionals, when we can and should use them and what results we can expect if we apply them correctly. 
 
To get a better education in Reliability engineering for Maintenance Professionals be sure and visit Mr. Barringer's web site after you read this article at www.barringer1.com.   - Terrence O'Hanlon, CMRP, Publisher 

Reliability Tools:

 Reliability tools exist by the dozens:  what are the tools, why use the tools, when should I use the tools, and where should I use the tools?  Click on the tools below for answers. 

Reliability Tools

Accelerated Testing

Design Review

HASS

OEE

Reliability Testing

Availability

Effectiveness

Life Cycle Cost

Pareto Distribution

Simultaneous Testing

Bathtub Curves

ESS

Life Units

Poisson Distribution

Software Reliability

Block Diagram Models

Events/Incidents

Load-Strength

Probability Plots

Sudden Death Testing

Capability

Exponential

Lognormal

QFD

TPM

Configuration Control

Failure

Maintainability

Reliability

Weibayes Estimates

Contract For Reliability

Failure Forecast

Maintenance

Reliability Audits

Weibull Analysis

Cost Of Unreliability

Failure Rates

Maintenance Engineering

RBDs

Weibull Database

Critical Items List

Fault Tree Analysis

Mean Time

Reliability-Centered
Maintenance

 

Data

FMEA

Mechanical Component Interactions

Reliability Engineering

 

Decision Trees

FRACAS Systems

Monte Carlo

Reliability Growth

 

Dependability

HALT

Normal Distribution

Reliability Policies

 

The details about these tools will be brief as books are written about each item.  Think of the presentations below as hors d’oeuvres (a little snack food or starters)—not the main course.   

The most important reliability tool is a Pareto distribution based on money—specifically based on the cost of unreliability which directs attention to work on the most important money problem first.  No magic bullet exists for reliability issues—don’t waste your time looking for a single magic tool—none exist!
 

Accelerated Testing-

What:     A test method of increasing loads to quickly produce age-to-failure data with only a few data points which are then scaled to reflect normal loads.

Why:      The benefit of accelerated testing is to save time and money while quantifying the relationships between stress and performance along with identifying design and manufacturing deficiencies to get useful data quickly and at low cost.

When:    Usually performed during the development of devices, components, or systems.  Also applies to items that have been in service to obtain a metric needed to show how the item is performing under heavy loads.  Accelerate testing is a useful method for solving old, nagging, problems within a production process.

Where:  Used for correlating test results with real life conditions.

Return to top

 

Availability-

What:    A tool for measuring the % of time an item or system is in a state of readiness where it is operable and can be committed to use when call upon.  Availability ceases because of a downing event which causes the item/system to become unavailable to initiate a mission when called upon.  In the simplest view the metric is availability = uptime/(uptime + downtime).  For many other definitions see MIL-HDBK-338, section 5.

Why:      The measure is important for knowing the commitment of time for performing the mission and it usually only involves the use of arithmetic.

When:    Often the measurement tool is based on past experiences and the complement of the measurement tool addresses unavailability to perform the task.

Where:  In design of a system it is a calculated value and in operation of a system it is a performance index that is often easy to use and provides and index that is understandable to the average person.  Today there is a great tendency to “Enronize” availability metrics by using uptime metrics that presents data in the best light (an issue of data integrity) to maximize managerial bonuses by excusing (deducting) downtime from the calculations to put lipstick on the pig.  Use the KISS principle.  Think of availability in terms of the investor’s typical year of 8760 hours.  The no-excuse annual metric in hours is availability = uptime/8760.  Suddenly you’ll find a metric of great interest to investors that can be bench marked as a financial issue, and thus motivate the management team to solve real issues of importance to the business.  Please note, you can have high availability but many failures and thus low reliability as availability ≠ reliability.  Likewise, you can have high availability but little output so team the metric with effectiveness to get the complete story.

Return to top

 

Bathtub Curves-

What:    The concept is derived from the human life experience involving infant mortality, chance failures, plus a wear out period of life since data for births and deaths is accumulated by government agencies.  Most equipment lacks the birth/death recording by government agencies and most non-human systems can be regenerated to live/die many times before relegation to the scrap heap.

Why:      Failure rates are different for both people and equipment at different phases of operation and the medicine to be applied to both humans and equipment need to be considered for effectively treating the roots of the problem.

When:    The concept is useful during design, operation, and maintenance of equipment and systems to understand the failure mechanisms

Where:  It explains the human experiences to the ordinary person to relate equipment/system failures to those experienced in real life so as to coordinate the design, operation and maintenance of equipment.  For other definitions see MIL-HDBK-338, section 9.

Return to top

 

Block Diagram Model (same as Reliability Block Diagram Models)-

What:    Reliability block diagram (RBD) models are graphical representations of a calculation methodology for reliability systems.

Why:      The RBD models allow calculation of system reliability based on knowing/assuming failure details of the components starting with the least component and growing the model to the greatest system to predict performance from the elements.

When:    RBDs are used in upfront designs as a performance parameter and after the system is constructed to ferret out poor performing blocks that limit the system performance.

Where:  Frequently used as a trade-off tool to search for the lowest long cost of ownership and to help sell alternative courses of action for moderating the effects of reliability issues or overcoming the poor performance by alternative designs where the results can be calculated before building the system as the results of the calculations provide knowledge about availability, maintenance interventions required for failures, and the number of spare parts required to sustain operations.  For other definitions see MIL-HDBK-338, section 4 and 6.

Return to top

 

Capability-

What:    A measure of how well the product performance meets objectives.  In short how well are the outputs actually accomplished against a standard?  Capability is frequently the product of efficiency * utilization.

Why:      Capability is a component of the effectiveness equation and usually under the control of production.

When:    Data for this metric is frequently produced by the Accounting department each month as a segment of the financial reports for the purpose of handling variances against the standards.

Where:  Frequently in the effectiveness measure it is a weak point [as a measure of how well the production process des the job for which it was purchased] requiring substantial improvement that cannot be solved by the usual reliability and maintainability (RAM) tools.  However, this metric may be deficient from the original design [an issue of design effectiveness] of the system or from the way the system is operated [an issue of use effectiveness].

Return to top

 

Configuration Control-

What:    Configuration control is involved with the management of change by providing traceability of failures back into the design standard.  If the design details are not specified, the design will not contain the requirements and thus implementation of the project will be hit or miss for achieving the desired end results beginning with the conceptual design and resulting in the operating facility.

Why:      With active configuration control you know where items are used and contained, where and why they were installed, where signal originate, what items are used where and in what environments, what drawing revisions have occurred and the product conforms to the drawings and specifications, what alternate materials/components have been used, and test reports/certifications are available as original documents for review.

When:    Configuration control begins after the first design review to build an unbroken chain of traceability to aid in avoiding surprises in the field which would destroy the designed-in criteria for availability, reliability, maintainability, and cost effectiveness established as a portion of the original design criteria.

Where:  Frequently these documentation details are assembled into a dossier with third party witnessing for use in validating conformance to the design requirements and provided to the owner of the equipment as witness documents.

Return to top

 

Contracting For Reliability-

What:    Say what you want and want what you say to your vendors.  Provide explanations of the objectives in contracts in terms the vendors will understand.

Why:      If you can’t spell clearly spell-out the requirements for availability, reliability, and maintainability the contractors cannot make these issues features of the design.  Thus it is important to be specific in the features the design must manifest.  Explanations such as: “You know what I want and what I need, just do it quickly” are self defeating expressions of vague generalities that lead to inferior designs and constant arguments.  Be specific about requirements for building reliability block diagrams, using quality function deployment, performing failure mode and effects analysis, conducting fault tree analysis and finally conducting design reviews for reliability.

When:    Write the specifications before procurement begins.  Plant to spend time with your own Purchasing Department to explain the details and sell the team on the financial advantages for including reliability requirements into the specifications; and likewise, spend time selling your vendors on the requirements and why they are stated.

Where:  These are up front decisions to avoid replication of previous problems that are built into previous designs and never corrected.

Return to top

 

Cost Of Unreliability-

What:    The cost of unreliability is a big picture view of system failure costs, described in annual terms, for a manufacturing plant as if the key elements were reduced to a series block diagram for simplicity.  It looks at the production system and reduces the complexity to a simple series system where failure of a single item/equipment/system/processing-complex causes the loss of productive output along with the total cost incurred for the failure.  If the system IS sold out, then the cost of unreliability must include all appropriate business costs such as lost gross margin plus repair costs, scrap incurred, etc.  If the system is NOT sold out, and make-up time is available in the financial year, then lost gross margin for the failure cannot be counted.  The cost of unreliability is a management concern connected to management’s two favorite metrics: time and money.

Why:      In private enterprise, failures must be concerned from a financial view point and not a gear-head approach of simply counting the number of failures; and you must speak the language of the enterprise which describes events by monetary measures over a period of time.  The annual cost for failures is usually not stated in a clear cut manner nor is failure costs summarized by system/sub-system to identify the weak links in a monetary fashion so that appropriate action is taken to reduce the annual cost of unreliability by building a clear Pareto distribution to attack the vital (high cost) areas with an action plan to reduce failures (unreliability) and to reduce the cost of unreliability.

When:    For new a new plant, this can be a design criteria to limit costs of unreliability for competitive reasons in the marketplace, i.e., by plan, the hidden costs of failures is made obvious as a portion of the strategic plan.  For an existing plant, this can be an exercise in defining the cost of unreliability and building a long term plan to reduce the cost of failures as a portion of the tactical plan.

Where:  This activity is best performed with high level involvement of the management team to provide fundamental understanding of the size of the icebergs about to rip out the underbelly of the plant and to involve the organization in a plan to reduce the costs so that profits are pushed upward because of the improvements.  If the cost of unreliability cannot be reduced, then the costs become extra weight for the saddle bags in the race for survival.

Return to top

 

Critical Items List-

What:    The critical items list is a top level summary of problems/cost used for discussions with management about key reliability issues.  The summary list converts technical details to a summary of costs and time while placing the issues into a Pareto distribution explained in terms of money and the vital few problems to be solved for competitive reasons.

Why:      The purpose of the critical items list is to focus management’s attention on items that need to be resolved during the design phase as a corrective action loop for influencing the life time costs.

When:    The list starts with the first design review as issues are disclosed in design reviews for reliability.

Where:  The critical items list is presented to top level management as issues to be accepted or resolved before paper plans become steel and concrete.

Return to top

 

Data-

What:    Data is the informational energy which runs the reliability improvement machine.  Data is acquired at great cost.  Data needs to be retained and used to prevent future failure events.  Proper use of data provides an understanding of failure mechanisms and prevents reoccurrence of bad events which cause safety or high cost failures to occur.  Reliability data requires definition of a failure.  Failures can be catastrophic failures or slow degradation—you decide by defining the failures.  The units of the measure for the data must be in units of the degradation—sometimes it is hours, some times it is miles, and so forth—in short, what ever motivates the failure.  Reliability always ceases with a failure or a removal from service in some aged condition which then generates a category of data called a suspension or censored data.  Data is information in the form of facts, figures, or engineering databases which is obtained from engineering tests, experiments, or actual operating conditions.  Reliability data is often incomplete as the exact times to failure are rarely known or recorded with much precision so that only partial information is available for analysis.  Reliability data comes in two forms: 1) age-to-failure data, and 2) censored/suspended data such as occurs when unfailed items are removed from service or when they fail due to a different failure mode than we are studying—this is useful information and part of the data set.   Some data is better than no data for resolving reliability issues.

Why:      Data is the information that, when used in an informed manner, helps prevent repetition of bad history and allows an enlightened approach to rationally solving a reliability issue using facts and figures.   Intelligent use of data for reliability issues provided the objective evidence needed for helping to solve the root cause of failures.

When:    Databases of reliability information of past experience is very helpful for predicting future failure events.  The data is helpful if failure rates, or the reciprocal of failures rates is described in mean times to failure which reduces the information to an average failure rate or average time to failure.  The reliability data is particularly valuable if retained for components as a Weibull data base with shape factor beta and scale factor eta.

Where:  The data is useful for understanding failure modes, and for predicting future failures for a population of equipment during the design stage and for predicting future failures with subsequent increases in the aging of equipment.  The role of the reliability engineer is to acquire the failure data and convert the data into useful information for both current and future use.

Return to top

 

Decision Trees-

What:    Most business decision have considerable uncertainty which implies at least two outcomes if you choose a course of action.  Making decisions in the face of uncertainty requires the costs for taking action and the probability along with the cost for not taking action and the probability of the occurrence.  In most cases the probabilities are not well known (maybe to one significant digit) and the costs are not well know (maybe to $10000).  The quantitative assessment is called risk assessment.  The issue is to take these not well identified issues and devise a strategy which can minimize exposure to risk for the business. The graphical representation of the methodology is called decision trees to reach the expected values for decision to take/not-take action.

Why:      Most business decisions have no exact answers, i.e., no black and white answers but rather shades of grey.  The use of the tool is to help decide which course of action may be to the advantage of the business given the best estimates that can be made.

When:    Decisive details will only be know into the future and decisions have to be made today so use of decision trees are tools to help wisely span from today into the future with the wisest decisions that can be made from sketchy data.

Where:  If you have absolute date, use it.  Must most decisions must be made with indecisive information which requires decisions about the odds for a given event, usually based on estimates—the wiser the estimate the better the decision, taking into account the probabilities of the outcomes and the money involved in the decision.  Use this tool when few details are available and you must be the pioneer to cut through the forest to reach the promised land of opportunity and profitable ventures.

Return to top

 

Dependability-

What:    The International Electrical Congress (IEC) defines dependability as “Dependability describes the availability performance and its influencing factors: reliability performance, maintainability performance and maintenance support performance.”  MIL-HDBK-338 defines dependability differently as a measure of the degree to which an item is operable and capable of performing its required function at any (random) time during a specified mission profile, given that the item is available at mission start.  (Item state during a mission includes the combined effects of the mission-related system R&M parameters but excludes non-mission time; see availability.)  Dependability is related to reliability with the intention that dependability would be a more general concept than the measurable issues of reliability, maintainability, and maintenance.

Why:      The key dependability issue is make equipment and processes work as advertised, which is, without failure.  Dependability aims at facilitating co-operation by all parties concerned (supplier, organization, and customer by fostering an understanding of the dependability needs and value to achieve the overall dependability objectives) so it involves harmonizing conflicting issues.  Dependability has a better view point from the end user of the equipment or system than from the designer’s viewpoint or the maintainer’s viewpoint.  From a system effectiveness viewpoint, reliability and maintainability provide system availability and dependability.

When:    You cannot repair yourself to happiness with a failure prone system as the failure prone system will be viewed lacking dependability to function as required when you need it.  Thus dependability is viewed over the longer term and not in convenient snap-shots and dependability also involves life cycle cost issues.

Where:  Reliability contributes directly to uptime by avoiding failures whereas maintainability contributes directly to reducing downtime by faster repairs.  Thus reliability and maintainability jointly provide impact on dependability of the system.  Dependable systems must be ready to function, in an operable state, to produce the desired output, upon demand by the end user, at the specified quantity and quality of output.

Return to top

 

Design Reviews For Reliability-

What:    Specific questions to ask the design engineers during a review specifically for reliability using failure data from operations and maintenance are: 1) show the calculated availability for the system based on a RAM model, 2) show the calculated number of failures during the specified mission time between turnarounds based on a reliability and maintainability (RAM) model, 3) show details of FEMA studies, 4) show details of FTA calculations,  5) show the calculated mean times between downing events, 6) show the calculated the mean time between cutbacks from full production capability and losses thus incurred, 7) show the QFD matrix and details, and 8) show the calculated cost of unreliability.

Why:      Design reviews should demonstrate by calculation or through the use of models and reliability tools that the system is capable of achieving the design objects rather than making a giant leap of faith that all will be well and good.

When:    Design reviews for reliability should be a part of the design process starting with conceptual designs and ending when the drawings are revised for the as-built system.

Where:  This is a logical extension of the design process to show me rather than tell me how the system will function and is performed as a portion of the up-front design by the numbers process.

Return to top

 

Effectiveness-

What:    The potential or actual probability of a system to perform a mission for a given level of performance under specified operating conditions defined as the product of reliability*availability*maintainability*capability.  Many variants of the effectiveness equation exist, e.g., OEE, and others.

Why:      The effectiveness equation defines the ability of a product, operating under specified conditions, to meet operational demands when called upon.  This is a practical measure of how well the system is performing—not how well we want it to perform but a practical measure of how it’s doing.  Since all the elements are measured between 0 to 1, the elements of the equation quickly draw the eye to where opportunities exist for making improvements.

When:    The effectiveness equation is useful for trade-off boxes for various alternatives when plotted on an X-Y scale for effectiveness vs net present value (NPV) for improvement alternative selections.  For the elements::
reliability defines the probability of a failure free interval (or the complement unreliability which describes the probability of failure),
availability defines the probability of the system being up and alive to handle the demand (or the complement, unavailability which describes the probability of the system being down),
maintainability defines the probability of making repairs within the allowed repair standard,
capability defines the probability of production achieving the desired production results [a measure of how well the product performs compared to the standard] and frequently it is described as the product of efficiency * utilization where
      efficiency is an output/input relationship such as (output achieved)/(the standard required) and
      utilization is how time is used such as (direct labor)/(direct labor + labor lost)
                      [in the old days, if this index decreased to as low as 80% we went berserk—today,
                      you can’t get this high because of wasted time when noses are not to the grindstone!!!].

Where:  It is used to describe new systems and old systems performance.  Consider this example for effectiveness:  If we are comparing a heavy duty truck versus a sports car for transportation, the truck may be more effective for heavy loads whereas the sports car may be more effective for acceleration and high speeds—neither are defined by the effectiveness equation until the mission is defined.

Return to top

 

Environmental Stress Screening (ESS)-

What:    A series of screens are conducted under environmental stresses to disclose weak parts and workmanship defects which require corrections and this requires and understanding of burn-in testing and ESS of which both techniques identify weak points and eliminate them by motivating early failures.  Burn-in is usually a long process of operating under load(s) and at fixed temperature (in short, this is a special case of ESS) or it can be operated at varying loads and accelerated temperatures to achieve a shorter burin-in period, whereas ESS is a scientifically planned and conducted test which is usually conducted under accelerated loads to produce the same test/use results in a shorter period of time by increasing the stress on the components or assemblies.  The objective of these screens is to produce a failure free product when released into operations.  ESS is not intended as a test to validate compliance to a design, however it is intended to force latent defects into becoming defects before the end user finds them in day-to-day usage.

Why:      The extremes of operating conditions such as high power levels, high temperatures, high vibration levels, etc. produce failures not anticipated from testing at nominal conditions.  Generally ESS is directly applicable and interpreted to be applicable to electrical/electronic equipment, however the same issues/concepts apply to mechanical equipment when the stressing conditions are loads/pressures/temperatures/vibrations/thermal shocks/etc., so as for all reliability issues—think broadly!

When:    When acquiring data, the tests are done upfront of production.  When controlling early failures that would be discovered by the end user, these test are done as a portion of the production process to eliminate week units to control warranty costs and improve customer satisfactions

Where:  Some tests are conducted in the laboratory for quick results and then the data is used to control product testing/release for the purpose of limiting costs and preventing the loss of customers from unsatisfactory performance in the field.

Return to top

 

Events/Incidents-

What:    Events/incidents are single events or occurrences that happen, especially one that is particularly significant, that results in a failure from an non-aging mechanism for reliability purposes.  Usually the event/incident result in a serious consequence of the loss of functional life of a component or system.  The death of the device must be recorded as censored (suspended) data.

Why:      For reliability purposes, failure of the component, device, subassembly, or system has been a success up to the point in life where a failure from a non-aging event too place.  This means the event-age was a success (up to the point it was killed by an event/incident) and inclusion of the data is required as censored/suspended data—this is important data.

When:    Include the suspended/censored data into every analysis.  Young suspensions/censored data have little impact on the results of an analysis but old suspensions have major effect on the analysis.

Where:  The data is used for MTBF/MTTF analysis and particularly for Weibull analysis.

Return to top

 

Exponential Distribution-

What:    The probability of survival and of failure of components or equipment is under the condition of chance failure which means a constant instantaneous failure rate where the die-off rate is the same for any surviving (unfailed) population.  An old part is as good as a new part.  For any survivors in this memory-less system that have survived to time t, a certain percent of the survivors will die in a specified interval of time such as 2*t.  The reliability of the system is often described by the exponential distribution because many times a system is made-up of mixed failure modes which in the aggregate will function like a constant failure rate system.  The reliability of exponential distributions are described mathematically as R(t) = e^(-lt) = e^(-t/Q) where t is the mission time, l is the failure rate, and Q is the mean time, given that l=1/Q.  The exponential distribution is frequently used as a first approximation to describe reliability based on a simple failure rate or a simple mean time to failure—particularly if the system or component has multiple failure modes.

Why:      The constant hazard rate, l, is usually a result of combining many failure rates into a single number.

When:    The exponential distribution is frequently used for reliability calculations as a first cut based on it’s simplicity to generate the first estimate of reliability when more details failure modes are not described.

Where:  In electronic systems (which can have many different types of failure modes and the fact that any electrical/electronic system is an amalgam of many different components) the simple assumption is that the electrical/electronic package will have a constant failure rate system defined by the exponential distribution.  When in doubt about the failure mechanisms, it is common to assume use of the exponential distribution with it’s constant failure rate for simplicity.

Return to top

 

Failure-

What:    Failure is the loss of function when you needed the function to occur.  Failures for reliability purposes must be precisely defined so they are recorded correctly.  Much life data is incomplete because failures are mixed-up with censored/suspended data where aged items may not have failed or they represent removals from service before failure, or they have not yet failed for the mode of failure under study—in short these censored/suspended items represent successes and are a portion of data set for study.

Why:      We study failed items for the same reason we do autopsies on humans—we want the data and we want it categorized correctly for making important decisions.  Failures require: 1) a time origin which must be unambiguously defined, 2) a scale for measuring the passage of time/starts/stops/etc. which motivates failure, and 3) the meaning of failure must be entirely clear for recording the event.

When:    Failure data must be recorded as it occurs to prevent loss of information.

Where:  The CMMS system is frequently where most data resides but usually in crude fashion.  The failure data is often transferred into the FRACAS system for converting the symptoms of the failure into the root causes of failure.  The failure data must be converted into action items for making management decisions about future failures and the corrective action needed.

Return to top

 

Failure Forecast-

What:    Failure forecasting is a projection of failures into the future based on assumed or documented failure details.  It is also known as risk analysis of future failures.  For a constant failure mode system this is very straight forward.  However for complicated failure modes where the failure rate increases with time (wear out failure modes) or where failure rates decrease with time (infant mortality failure modes) this becomes a more complicated analysis as described by the Abernethy Risk which is described in The New Weibull Handbook and implemented in the software package WinSMITH Weibull for predicting future failures.  Like wise, reliability block diagrams are useful for predicting future failures when the authentic failure details are supplied to the Monte Carlo models. 
Please note manufacturers follow two general strategies for their equipment:
      1) build the equipment to avoid failures even though this increases the original capital costs, or
      2) build equipment and see the original equipment at a low cost (or even a break-even costs)
          expecting to make profits with the sale of replacement parts.
Thus for end users of the procured equipment, it is important to know the forecasted failures in the face of suppler protest that “our equipment never fails”—in that case ask to see the sale of spare parts for similar equipment and an estimate of the number of units working to get a crude estimate of the strategy employed by the equipment supplier.
A failure is an event which renders equipment as non-useful for the intended or specified purpose during a designated time interval.  The failure can be sudden, partial, or one-shot, intermittent, gradual, complete, or catastrophic.  The degree of failure can be degradation or gradual, sudden, or one-shot, from weakness, from imperfections, from misuse, or so forth.
A failure mechanism includes a variety of physical processes which results in failure from chemical, electrical, thermal, or other insults.

Why:      Future failures costs money and frequently increase the risk for safety or environmental problems.  For manufacturers, the forecasted failures predict impending high costs for warranty expenses which can make/break a company.  With good failure forecast, you can anticipate expected failures now (after x-usage), future failures when failed units are not replaced, and future failures when failed units are replaced either with the same failure modes or with differently designed components with different failure detais.

When:    This analysis is wisely performed in during the design of the equipment, however many surprises arise from different failure modes build into the assembled product or incurred by not anticipated usage in operations.

Where:  Generally this analysis is made during the up-front design effort—with much disbelief the products could be “this bad”.  Follow-up analysis occurs when unexpected failure modes arise during operation of the equipment which causes loss of service of the equipment and high costs for the end users.

Return to top

 

Failure Rates-

What:    Failure rates, in the simplest form, are S(time in use)/S(number of failures) or the reciprocal of mean times to/between failure.  For more sophisticated failure data bases such as Weibull data bases the failure rates can be disclosed without giving away proprietary data such as the shape factors, beta, which tell the failure mode for the equipment.

Why:      Simple failure rates are a precursor of maintenance events and production interruptions that will occur into the future which drive up costs and cause chaos.

When:    Failure rates derive from the history of operation or from well known data sources such as OREADA, IEEE 500, IEEE 493, EPRI, and other sources listed in reading lists for reliability including Weibull databases.

Where:  The failure rates are used as an awareness criteria for the average person just as you used automobile fuel consumption rates for understanding the health of your automobile as well as anticipating your weekly/monthly/annual out-of-pocket expenditures for gasoline or diesel fuel.  The failure rates drive the maintenance interventions, spare parts, and maintenance cost for the Maintenance Department.  Similarly they predict the interruptions to the process and lead to misses on promised deliveries and result in negative variances for production costs.  In sort, failure rates are precursors for the misery expected for the organization.

Return to top

 

Fault Tree Analysis-

What:    Fault tree analysis (FTA) is a top down processes of defining the top level problems and through a deductive approach using parallel and series combinations of possible malfunctions to find the root of the problem and correct it before the failure occurs.  The reliability tool can be used as qualitative or quantitative methods.

Why:      The tool aids the design process, shows weak links that cause failures, and in the critical legs of the trees helps to define maintenance strategies for which pieces of equipment and processes should be defended with the greatest maintenance vigor to prevent “Murphy” from shutting down the process or causing serious safety issues. The technique provides a graphical aid for the analysis and it allows many failure modes including common cause failures.  Results from a FTA is usually more pessimistic that other analysis tools such as RBDs as you can see from a study of the Space Shuttle reliability analysis where each system is studied by multiple reliability tools because of the high cost/profile of failures.

When:    FTA is widely used in the design phase of nuclear power plants, subsea control and distribution systems, and for oversight studies in layers of protection studies for process safety and loss control in chemical plants and refineries so as to prevent accidents and control the costs of risks.  The technique is helpful for identifying critical fault paths, observing vague failure combinations before they occur in reality, comparing alternate designs for safety, and setting a methodology to provide management with a tool to evaluate the overall hazards in a system and avoid single sources of critical failures.  Finally when thinking top down about failures and where/how they can occur, the methodology gives a diagram for setting maintenance strategies for protecting key pieces of equipment/processes to prevent failures.

Where:  FTA is helpful for defining potential event sequences and potential incidents, evaluating the incident consequences of outcomes, and estimating the risks of events occurring.  FTAs work in the design room and on the operating floor where first hand kn