View by category
Articles: Reliability Engineering
Data
Data is the informational energy which runs the reliability improvement machine. Data is acquired at great cost. Data needs to be retained and used to prevent future failure events. Proper use of data provides an understanding of failure mechanisms and prevents reoccurrence of bad events which cause safety or high cost failures to occur. Reliability data requires definition of a failure. Failures can be catastrophic failures or slow degradation-you decide by defining the failures. The units of the measure for the data must be in units of the degradation-sometimes it is hours, some times it is miles, and so forth-in short, what ever motivates the failure. Reliability always ceases with a failure or a removal from service in some aged condition which then generates a category of data called a suspension or censored data. Data is information in the form of facts, figures, or engineering databases which is obtained from engineering tests, experiments, or actual operating conditions. Reliability data is often incomplete as the exact times to failure are rarely known or recorded with much precision so that only partial information is available for analysis. Reliability data comes in two forms: 1) age-to-failure data, and 2) censored/suspended data such as occurs when unfailed items are removed from service or when they fail due to a different failure mode than we are studying-this is useful information and part of the data set. Some data is better than no data for resolving reliability issues.
Decision Trees
Most business decision have considerable uncertainty which implies at least two outcomes if you choose a course of action. Making decisions in the face of uncertainty requires the costs for taking action and the probability along with the cost for not taking action and the probability of the occurrence. In most cases the probabilities are not well known (maybe to one significant digit) and the costs are not well know (maybe to $10000). The quantitative assessment is called risk assessment. The issue is to take these not well identified issues and devise a strategy which can minimize exposure to risk for the business. The graphical representation of the methodology is called decision trees to reach the expected values for decision to take/not-take action.
Dependability
The International Electrical Congress (IEC) defines dependability as "Dependability describes the availability performance and its influencing factors: reliability performance, maintainability performance and maintenance support performance." MIL-HDBK-338 defines dependability differently as a measure of the degree to which an item is operable and capable of performing its required function at any (random) time during a specified mission profile, given that the item is available at mission start. (Item state during a mission includes the combined effects of the mission-related system R&M parameters but excludes non-mission time; see availability.) Dependability is related to reliability with the intention that dependability would be a more general concept than the measurable issues of reliability, maintainability, and maintenance.
Deriving Task Periodicities Within Reliability Centered Maintenance (RCM)
By Bill Berneski, CRE, CMRP
This article does not present original theorems in mathematics or RCM, but is an attempt to consolidate some existing but distinct concepts relating to maintenance periodicity selection, and to provide some guidance on the best way to apply them.
An RCM analysis should result in the optimum failure-management strategy for a piece of equipment or a system. A failure-management strategy means that the analyst will choose whether to prevent a failure mode (preventive maintenance), modify the equipment to preclude a failure mode (alterative maintenance), or simply allow it to occur and accept the consequences (fix when fail). When the choice is made to prevent the failure mode, a preventive (PM) task is chosen. But the tough question after a task is chosen is how often should it be done?
We will attempt to answer that question.
Design Reviews For Reliability
Specific questions to ask the design engineers during a review specifically for reliability using failure data from operations and maintenance are: 1) show the calculated availability for the system based on a RAM model, 2) show the calculated number of failures during the specified mission time between turnarounds based on a reliability and maintainability (RAM) model, 3) show details of FEMA studies, 4) show details of FTA calculations, 5) show the calculated mean times between downing events, 6) show the calculated the mean time between cutbacks from full production capability and losses thus incurred, 7) show the QFD matrix and details, and 8) show the calculated cost of unreliability.
Detective Maintenance
When we think about maintenance strategies, the words predictive, preventive, corrective, and breakdown spring to mind. There is however an important class of tasks that we do to ensure that our equipment and Plant remains safe and productive. These tasks are based on a Detective Maintenance strategy. They help us win our licence-to-operate and ensure long term viability. With machinery and Plants becoming increasingly more complex, the proportion of such tasks in the total maintenance program is growing.
Managing a business efficiently means that we have to manage risks well. In turn, this requires that our safety devices and systems work on demand. It is possible to arrive logically at the required availability of the items in question and find suitable detective maintenance strategies. While analysis is relatively easy, there are several hurdles in implementing their results. These challenges can be met by a range of solutions. They are not universal and need to be tailored to each situation.
The word pro-active is very popular, especially in the maintenance context. Detective Maintenance strategies are pro-active. More importantly, they are essential to long term success.
Don’t Underestimate PQ’s Impact on Reliability and Process Uptime
By Ross Ignall, Director of Product Management, Dranetz
Industrial facilities are finding new respect for power quality issues as a result of proliferating non-linear loads and other factors. As a result, sales are up for power quality monitoring and analysis instruments for identifying and characterizing PQ problems as a first step to applying the fix.
The increased demand for productivity improvements in the global economy has clearly resulted in an escalating need for the reliable operation of electronic and electrical equipment. Unfortunately, the grid's aging and often overloaded electrical infrastructure, along with that of many facilities, makes operations more vulnerable than ever to power system disturbances.
Education In Maintenance & Reliability Engineering
By Ray Beebe
We recently learned about Monash University and its distance education programs for Maintenance and Reliability. We asked Ray Beebe to put together some details so we could bring that story to you. Here is what Ray sent us:
Effectiveness
The potential or actual probability of a system to perform a mission for a given level of performance under specified operating conditions defined as the product of reliability*availability*maintainability*capability. Many variants of the effectiveness equation exist, e.g., OEE, and others.
Eliminating the Sources of Poor Reliability: Defects
By Winston Ledet
The basis for our approach originated in DuPont in the late 1980s. Faced with a decentralization of their maintenance department, DuPont was interested in how to sustain their functional excellence in maintenance. DuPont, normally an inwardly focused company, did something fairly unusual at the time. They engaged A.T. Kearney to benchmark their performance against the "best-of-the-best" in the U.S., Europe and Asia (Figure 1). DuPont found that they didn't need to worry about losing functional excellence because they did not have it to begin with. The most dramatic difference in performance was with the Japanese who had much lower costs with greater reliability and lower stores inventory.
Environmental Stress Screening (ESS)
A series of screens are conducted under environmental stresses to disclose weak parts and workmanship defects which require corrections and this requires and understanding of burn-in testing and ESS of which both techniques identify weak points and eliminate them by motivating early failures. Burn-in is usually a long process of operating under load(s) and at fixed temperature (in short, this is a special case of ESS) or it can be operated at varying loads and accelerated temperatures to achieve a shorter burin-in period, whereas ESS is a scientifically planned and conducted test which is usually conducted under accelerated loads to produce the same test/use results in a shorter period of time by increasing the stress on the components or assemblies. The objective of these screens is to produce a failure free product when released into operations. ESS is not intended as a test to validate compliance to a design, however it is intended to force latent defects into becoming defects before the end user finds them in day-to-day usage.
Events and Incidents
Events/incidents are single events or occurrences that happen, especially one that is particularly significant, that results in a failure from an non-aging mechanism for reliability purposes. Usually the event/incident result in a serious consequence of the loss of functional life of a component or system. The death of the device must be recorded as censored (suspended) data.
Exponential Distribution
The probability of survival and of failure of components or equipment is under the condition of chance failure which means a constant instantaneous failure rate where the die-off rate is the same for any surviving (unfailed) population. An old part is as good as a new part. For any survivors in this memory-less system that have survived to time t, a certain percent of the survivors will die in a specified interval of time such as 2*t. The reliability of the system is often described by the exponential distribution because many times a system is made-up of mixed failure modes which in the aggregate will function like a constant failure rate system. The reliability of exponential distributions are described mathematically as R(t) = e^(-lt) = e^(-t/Q) where t is the mission time, l is the failure rate, and Q is the mean time, given that l=1/Q. The exponential distribution is frequently used as a first approximation to describe reliability based on a simple failure rate or a simple mean time to failure-particularly if the system or component has multiple failure modes.
Failure
Failure is the loss of function when you needed the function to occur. Failures for reliability purposes must be precisely defined so they are recorded correctly. Much life data is incomplete because failures are mixed-up with censored/suspended data where aged items may not have failed or they represent removals from service before failure, or they have not yet failed for the mode of failure under study-in short these censored/suspended items represent successes and are a portion of data set for study.
Failure Forecast
Failure forecasting is a projection of failures into the future based on assumed or documented failure details. It is also known as risk analysis of future failures. For a constant failure mode system this is very straight forward. However for complicated failure modes where the failure rate increases with time (wear out failure modes) or where failure rates decrease with time (infant mortality failure modes) this becomes a more complicated analysis as described by the Abernethy Risk which is described in The New Weibull Handbook and implemented in the software package WinSMITH Weibull for predicting future failures. Like wise, reliability block diagrams are useful for predicting future failures when the authentic failure details are supplied to the Monte Carlo models.

- Alignment and Balancing
- Asset Management
- CMMS and EAM
- Green Reliability
- Human Asset Management
- Infrared Thermal Imaging
- KPIs - Reliability Performance Metrics
- Lean Maintenance
- Lubrication
- Maintenance Management
- Motor and Power System Testing
- MRO - Spares Management
- Oil and Fluid Analysis
- Planning and Scheduling
- PM Optimization
- Predictive Maintenance and Condition Monitoring Management
- Reliability-Centered Maintenance
- Reliability Engineering
- Reliability Leadership
- Root Cause Analysis
- Shutdowns and Turnarounds
- Total Productive Maintenance (Asset Care)
- Training
- Ultrasonics
- Vibration Analysis


- Total Productive Maintenance Books
- Motor Testing Books
- GPAllied Inspired Training: Hard-Hitting, Impactful Courses and Workshops.
- Reliability Engineering Services
- Commtest - Affordable Vibration Analysis
- R300S for When Good Enough – Isn’t Enough!
- MRO Inventory and Purchasing Books
- Job Postings for Maintenance and Reliability Professionals
- New AT33IND Perfect for QC & Troubleshooting
- IR Windows: Compare and Save with Exiscan
- Reliability Centered Maintenance for SAP Plant Maintenance
- Join The Association For Maintenance Professionals
- Belt/Sheave Alignment Laser, Custom Shim and Gaskets

- Things to think about (and do) in 2011
- Reliability Polls and Quizzes
- Risk Calculation Methodology
- Understanding & Comparing Risk
- Preventing Mechanical Failures - An Introduction to Failure Mode Identification
- Stress: The Silent Killer - Part 1 of a 2 part series
(2) - The Continuous Journey
- Reliability Quiz - MRO Storeroom Quiz
- Championing SAP Plant Maintenance at Your Organization
(1) - Q&A with the 2011 Uptime Award Winners
- Reliability Polls and Quizzes
- Things to think about (and do) in 2011
- Electric Motor Bearing Greasing Basics
(4) - Asset Management: concepts and practices
(8) - CBM 2011 Video Proceedings
(2) - Reliabilityweb.com 100 Top Web Sites
- Maintenance of Hydraulic Systems
(2) - Aerial Infrared – An Asset Management Tool for District Heating System Operators
- Root Cause Failure Analysis Web Workshops
- Things to think about (and do) in 2010
(30)

