logo

Reliability and Life Testing of Broadband Capacitors

Abstract: PPI broadband capacitors implement their RF function by packing an enormous amount of capacitance in ultra-small packages. The nickel electrodes, very high dielectric constants, and extremely small inter-electrode spacings all act to achieve the performance/price requirements … but also create behaviors that affect useful life. This application note examines and explains the failure modes and, particularly, emphasizes the required testing and resultant wear-out-region statistical analyses that must be done to accurately characterize the results.

1.0 Introduction

Whether you’re buying a refrigerator or a broadband multi-layer ceramic capacitor (MLCC), there’s at least one property you want of each – you want it to last. While porcelain capacitors with precious metal electrodes have been around for many years with well-established failure modes and carefully documented procedures and statistics to measure their reliability, this is not the case with the newer nickel-electrode, micro-miniature ferroelectric high-capacitance devices used in broadband applications. Failure rates of the latter are often significantly more sensitive to applied voltage and device temperature than the older capacitors, have additional failure modes, and simply wear out in shorter time periods. Concomitant behavior is substantial lot-to-lot variation in reliability – all of which makes careful testing critical to screen out non-performing or under-performing parts.

This document describes how and why broadband capacitors fail, how PPI tests them for reliability, and how we calculate failure-rate results under various applied voltage/temperature conditions. We begin with a very brief review of some structural features and mechanisms that cause MLCC failure.

2.0 Failure Mechanisms of Thin-Layer Base-Metal Electrode (BME) Capacitors

Broadband capacitors are typically made with nickel electrodes and barium titanate (BaTiO3) ceramics, the latter because their high dielectric constants permit very large capacitances in very small volumes. These devices have capacitances that vary significantly with applied voltage (defined by a Voltage Coefficient of Capacitance, VCC) as well as temperature (defined by a Temperature Coefficient of Capacitance, TCC). Inter-electrode spacings can be as small as 0.7 um, and dielectric constants >5000. As shown in Fig. 1 from [1], these conditions create many opportunities and areas for things to go wrong.

Fig. 1 from [1], these conditions create many opportunities and areas for things to go wrong.
Fig. 1 from [1], these conditions create many opportunities and areas for things to go wrong.

Fig. 1 Weak areas in Thin-layer BME MLCCs

Barium titanate single crystals have the structure shown in Fig. 2a from [2], but this structure is modified in MLCC dielectrics to that shown in Fig. 2b [17].

Each grain of the dielectric consists of a cubic shell surrounding a tetragonal BaTiO3 core [3], [4]. The shell is comprised of rare earth elements such as Dysprosium (Dy), Holmium (Ho), and Erbium (Er) [4], which serve two functions: (a) to moderate the temperature sensitivity of the barium titanate dielectric constant, and (b) to reduce (although not eliminate) the positively-charged oxygen vacancies that result from firing in a reducing atmosphere, which is necessary to avoid oxidation of the base-metal electrodes. The oxygen vacancies migrate under exposure to an electric field and reduce the dielectric’s insulation resistance, leading to eventual device failure.

Irregularities in the electrode surfaces and edges, plus unavoidable impurities and voids, create further deviations from ideal behavior. Fig. 3 [1] is illustrative.

Irregularities in the electrode surfaces and edges, plus unavoidable impurities and voids, create further deviations from ideal behavior.  Fig. 3 [1] is illustrative.

Fig. 3 Manufacturing non-idealities that contribute to BME capacitor failure

All the above, the result of manufacturing processes, contribute to failure, as do cracks in the ceramic. Cracks can be generated in the manufacturing process, by PC board flexure, temperature changes, soldering processes, handling (e.g., pick-and-place machines), and exposure to moisture or chemicals. Oxygen-vacancy migration along the surface of cracks facilitates local charge accumulation and accelerates the time to failure [5].

If a DC voltage is placed on an MLCC, a leakage current results; the ratio of the voltage to this leakage current is called the insulation resistance (IR), and is used as one measure of a capacitor’s integrity. There are a variety of physical mechanisms that create leakage currents, some strongly temperature dependent, some not, the details of which are beyond the scope of this discussion, but can be pursued by the interested reader [5]. A key observation regarding BME capacitor lifetime is offered in [6]:

Two failure modes can be identified in these BME capacitor lots: catastrophic and slow
degradation. A catastrophic failure is characterized by a time-accelerating increase in leakage current that is mainly due to existing processing defects (voids, cracks, de-laminations, etc.) or to extrinsic defects. A slow degradation failure is characterized by a near-linear increase in leakage current against stress time; this is caused by the electromigration of oxygen vacancies (intrinsic defects).

But [7] notes that the defect mechanisms above – voids, cracks, and oxygen vacancy migration — often interact:

In the presence of defects, migration of oxygen vacancies towards the cathode is enhanced either by increased electric field at the defect area, as in case of thinning of the dielectric, or by increased mobility of oxygen vacancies, as in case of cracks. In either case, accumulation of positively charged vacancies at local areas results in increased leakage currents and formation of hot spots.

If the hot spots are such that the heat generated is balanced by that conducted away, then the leakage current reaches a stable value. But if this is not the case, if the temperature at the defect sites reaches that of the melting temperature of Nickel (1455 0C) or sintering temperature of the ceramic (≈1250 0C), then a catastrophic failure occurs.

3.0 Reliability Calculations and Formulas

Reliability predictions frequently begin with the so-called “bathtub” curve, which describes the failure rates of electronic components and equipment; it is shown in Fig. 4.

Reliability predictions frequently begin with the so-called “bathtub” curve, which describes the failure rates of electronic components and equipment; it is shown in Fig. 4.

Fig. 4 The “bathtub” curve: a representation of electronic component failure rates

The Early Life or “Infant Mortality” segment models failures generally resulting from manufacturing process imperfections, material defects, and damages in handling — some controllable, some not. Failures in the Steady-State period typically result from either random events (e.g., in the case of memory chips, cosmic ray impacts) or various infant mortality mechanisms that are sufficiently spread out in time that they appear to be random. The final curve portion, the so-called Wear-Out stage, results from fundamental physical limitations of the materials and interfaces.

The instantaneous failure rate, λ(t) – also known as the Hazard Function — is defined as the rate of change of the cumulative failure probability divided by the probability that a unit will not already have failed by time t. If the cumulative fraction of failures from time T = 0 to T= t is denoted by F(t), the hazard rate is

time T = 0 to T= t is denoted by F(t), the hazard rate

(1)

For each segment of F(t) corresponding to that of λ(t), there is a very flexible expression that can be fit to the data points: the Weibull characteristic.  The 2-parameter Weibull cumulative distribution function (cdf), the cumulative probability of failure at time t (or Unreliability Function), is given by

time T = 0 to T= t is denoted by F(t), the hazard rate

(2)

Here, η is a scale parameter — the time at which 63.2% (= 1- e-1) of the population has failed — and β is a dimensionless parameter that defines the shape of the curve and whose value is often characteristic of the failure mode under study. Fig. 5 shows the typical form of F(t).

Fig. 5 shows the typical form of F(t).
Fig. 5 Cumulative probability of failure vs. time

The bottom plot segments in Fig. 5 are linearized versions of those in the top diagram, which is accomplished by re-arranging (2) and successively taking the natural logarithms of both sides. One can then convert it to a linear form such that

(3)

y = βx – β ln(η)
where x = ln(t)
y = Ln(Ln(1/(1- F(t)))

Thus, as illustrated in Fig. 5b and Fig. 6, if we plot the cumulative percent failures on a graph at each of times, Ti, with a logarithmic x-axis (xi = ln Ti) and a y-axis – y = Ln(Ln(1/(1- F(Ti))) — they will lie, in most cases, on or close to a straight line, thereby indicating their conformity to a Weibull distribution. The slope of the line is β.  

hus, as illustrated in Fig. 5b and Fig. 6, if we plot the cumulative percent failures on a graph at each of times, Ti, with a logarithmic x-axis (xi = ln Ti) and a y-axis – y = Ln(Ln(1/(1- F(Ti)))

Fig. 6 Basic Weibull plot

The dashed extension of the steady-state unreliability plot in Fig. 5b illustrates a particular issue with many broadband thin-layer base-metal-electrode (BME) multi-layer ceramic capacitors (MLCCs): Unlike with High-Q MLCCs, where the wear-out region begins after such a long period that it can be neglected in projecting cumulative percent failures, the wear-out region in the BME devices generally occurs much earlier. Thus, as the dashed extension illustrates, substantial errors – the difference between t2 and t1 — will result if only steady-state failure rates are measured and assumed to apply for all time.

The above may simply be another way of saying that there is nothing sacred about the bathtub curve (Fig. 4) – in actuality, failure rate vs. time may not be bathtub-shaped at all but rather more like a U or V, with wear-out beginning almost immediately after infant mortality.

Combining (1) and (2) gives us the equation for the segments of the bathtub curve (Fig.1), the Hazard Function defined earlier:

hus, as illustrated in Fig. 5b and Fig. 6, if we plot the cumulative percent failures on a graph at each of times, Ti, with a logarithmic x-axis (xi = ln Ti) and a y-axis – y = Ln(Ln(1/(1- F(Ti)))

(4)

Note that if β=1, the failure rate reduces to a constant, 1/η. If β <1, the failure rate decreases with time; if β > 1, it increases with time. It is these properties that enable Weibull functions to be fit to all portions of the bathtub and cumulative-percentage-failure plots.

The Mean Time To Failure (MTTF) of a sample test population is given by

hus, as illustrated in Fig. 5b and Fig. 6, if we plot the cumulative percent failures on a graph at each of times, Ti, with a logarithmic x-axis (xi = ln Ti) and a y-axis – y = Ln(Ln(1/(1- F(Ti)))
(5)
                     Equation for gamma function

                       where Γ(n) is the gamma function of n:

Note, if β = 1,  and MTTF = 

Unlike when the failure rate is a constant (see later), the uncertainty interval of the MTTF for the general Weibull distribution is complicated to compute; Likelihood Ratio Bounds, Fisher Matrix, Beta Binomial, and other methodologies whose details and derivations are beyond the scope of this discussion are used to do so [8], [9], [10] … although later herein we’ll examine some of the basic ideas.

The time at which a given percent of the population will have failed, the so-called BX(%) life, is given by

(6)

BX(%) life = η{-Ln[R(x%)]}1/β

where R(x%) = 1 – F(x%)

F(x%) = cumulative failures as percentage of number of samples

Substituting η from (5) into (2) yields

(7)

 

Whence, at t = MTTF, the cumulative probability of failure is

(8)

  

Table 1 shows the percentage value of  for various values of β.

(9)

where Ea = activation energy in eV

z = Boltzmann constant = 8.62 x 10-5 eV/0k

T = absolute temperature in 0K

Life L(T) is then given by

Table 1  Cumulative failure percentage at the MTTF for various values of β

We should observe that most users become concerned at cumulative failure percentages much lower than those corresponding to the MTTF.

Before proceeding, three caveats concerning relations (5) and (6) should be noted:

  1. MTTF is a substantially limited indicator of reliability [11].  Devices can have the same MTTF, but very different reliability characteristics; i.e., the reliability can differ at different times.  Only when β =1, the steady-state, can reliability and failure rates be described by single numbers.  Thus, a cumulative percent failure characteristic is a much better descriptor of failure (unreliability) over time than an MTTF.
  2. The MTTF and BX(%) parameters above refer to each single segment of the unreliability curve.  When more than one segment must be considered, e.g., the steady-state plus the wear-out region, (5) and (6) are not applicable to the combined characteristic unless the steady-state cumulative failure percentage at the time wear-out begins is negligible.
  3. Table 1 indicates MTTFs in the steady-state and wear-out regions correspond to cumulative failure percentages ranging from 63.2 to 43.0.  But, as noted, most users become concerned at much lower percentage failures, e.g., those in the 1-10% range.  Thus, failure times in that range are significantly more meaningful as metrics of reliability.

In order to measure failure rates, it is usually necessary to perform Highly Accelerated Life Tests (HALT), wherein the applied voltage and temperature are both raised to levels that produce measurable numbers of failures in reasonable test times.  To relate the number of these failures to other voltages, temperatures, and exposure times, the acceleration factors must be known.

The temperature acceleration factor,   is most often considered [3] to follow the Arrhenius life stress model, wherein life is assumed proportional to the inverse of the reaction rate, , and

(10)

L(T) = KeG/T

                    where G =

                                K = a constant

And the acceleration factor is given by

(11)

where TA = accelerated test temperature in 0K

TU = use (operating) temperature in 0K

The life dependence on voltage is most often considered [3] to fit a power law,

(12)

L(V) = DVn

where D = a constant

n = a characteristic exponent

Whence the voltage acceleration factor, AV, is

(13)

   = (VU/VA)n

                        where VA = accelerated test voltage

VU = use (operating) voltage

Unlike for precious-metal-electrode (PME) capacitors, where a value for n of 3.0 is generally accepted, e.g., specified in MIL-PRF-55681, and a value for Ea of about 1.0 eV is similarly accepted, the values of these parameters for thin-layer BME broadband capacitors vary substantially, depending on device particulars. References in the literature [12], [13], [14] report n values ranging from 1.5 to 7.1, while Ea values extend from 1.1 to 1.50. Measurements at Passive Plus have indicated n values as high as 6.70 and Ea values ranging to 1.33 eV.

Since the Unreliability function (cumulative percent failure vs. time) is a random variable, its value, in general, will differ from test to test, i.e., the Unreliability of any particular population sample will, in general, differ from the failure rate of the entire population. This situation is often characterized by a “confidence level” – the percentage of times samples of a given size are expected to fall within a “confidence interval” – the upper and lower bounds of the estimate expected at a given confidence level. The diagrams of Figs. 4 and 5 do not reflect this reality: The distinct lines representing the probability densities and distributions should really be smears, the extent of which reflect the confidence levels and intervals.

Let L(V,T), be some measure of life — e.g., , MTTF, BX(%) life, etcetera – as a function of both voltage and temperature. Then, from (10) and (12), we can express that life as

(14)

where V = applied voltage

T = temperature

C = a constant to be determined

G and n, as defined under (10) and (13) respectively, are parameters to be determined

4.0 Steady-state Calculations and Formulas

When a population sample is measured for reliability, the “raw” failure rate is simply r, the total number of failures divided by the number of component-hours, i.e.

(15)

where r = total number of failures

N = number of components tested

= test duration in hours

Let us now consider the constant failure rate region of the bathtub curve. As noted, the Unreliability of any particular population sample will, in general, differ from the failure rate of the entire population, and so we need to use a statistical approach to arrive at the latter. In particular, we’re interested in the single-sided confidence level, that percentage of tests of the same number of parts that yields a given lower bound on the population survival rate. Fig. 7 below, adapted from [23], illustrates the two kinds of confidence bounds, two-sided and single-sided.

Fig. 7B, for example, depicts a situation that lets us make the statement, “If we do a large number of tests of the same-size population samples, 95% of them will survive for at least 10 years.”

To determine the single sided confidence bound using the classical statistics approach, it can be shown [24], [25], [29] that we must replace the number of failures in (15) by one-half the argument of a particular probability distribution, chi-square. I.e.,

(16)

where = the argument (x-axis value) of a chi-square probability cumulative distribution function of (y-axis) value α and “degrees of freedom” parameter r

1 – α = confidence level

The chi-square distribution results when ν independent variables with standard normal distributions are squared and summed [15]. 

The perhaps deceptively simple (16) requires explanations in several areas that remain beyond the present discussion’s scope, but that are addressed (with varying completeness and clarity) in the references below:

  • Where it originated, i.e., derivation of the chi-square distribution (from the Poisson and Gamma distributions) [27], [28]
  • Ambiguity of the symbol  — is it a probability?  (No.)  Is it a probability density?  (No.) [29], [30], [31]  It’s an x-axis value (possible values of a random variable, in this case, the number of failures) of either a pdf or cdf, and can be conveniently calculated using the Excel functions CHIINV or CHISQ.INV.RT, where the “INV” is the clue. [31] comes closest in explanation.
  • Is α a probability? (Yes)  And, if so, why is the confidence level 1-α? Appendix A of [31] again addresses this most directly.  (Hint: Failure rate has to do with the probability of failure, rather than the probability of survival, so the chi-sq calculation must use the lower limit, while the Excel functions noted above use the upper limit.)

The reliability of an MLCC can then be expressed as the product of the statistical model parameters above and the acceleration functions or “stress factors” resulting from electrical and thermal conditions, as given in (9) and (11).  Taking into account all the above, the classical statistics formula for λss is

(17)

where  = failure rate in % per 1000 hours
 = electrical stress factor, see (11)
 = thermal stress factor, see (9)

The MTTF, assuming that the constant failure rate applies forever – an assumption generally not true for broadband devices such as thin-layer BME X7R MLCCs (see the first paragraph under Fig. 6) – is given by

(18)

MTTF (hrs.) =   = 

where (%) is the numerical failure rate in %/1000 hrs.

FIT (Failures in Time) is another standard industry value, defined as the failure rate per billion hours. Thus,

(19)

FITs =  

Note, again, that the above relationships define a single-sided lower confidence bound (see Fig. 7B).

5.0 Test Descriptions

Let us now briefly describe the testing typically done to characterize Life.

Wear-out Region HALT.  These tests have two objectives:

  • To determine the constants, β, C, G (= Ea/z), and n needed to derive reliability behavior, e.g. BX(%) life, in the wear-out region, and determine the acceleration factors that enable reliability behavior in both the steady-state and wear-out regions to be extrapolated to voltages and temperatures other than those of the HALTs.
  • To help determine the time at which the steady-state region ends and the wear-out region begins.

Tests are usually done on a number of samples at various applied voltages and temperatures.  The times at which failures occur – with failures defined as direct current through the parts exceeding a specified value – are recorded, and the entire test terminated after a specific time.  

Fig. 8 depicts the setup on which the HALT tests are typically made.

Fig. 8 depicts the setup on which the HALT tests are typically made.</p>
<p>

Fig. 8  HALT test setup

Units are generally contacted with spring clips to avoid the possibility of damage resulting from soldering.  

Steady-state Life Test.  The objective of these tests is to:

  • Characterize reliability in the steady-state region, generally by a single-sided lower confidence bound on life
  • Establish changes in such electrical parameters as capacitance, DF, IR at 25 0C, IR at the highest rated temperature, and mechanical integrity.

Tests are usually performed at the maximum rated temperature but at higher-than-rated voltage in order to accelerate the number of failures in a given time.  The electrical parameters noted are monitored at various elapsed times, typically in the 250 – 4000 hour range, and failures are defined by maximum specified changes in those parameters.

6.0 Issues and Complications in Determining Reliability

So it seems as if, to determine the reliability parameters, all we need do is take appropriate data, plug it into (1) – (19) above, and we’d be all set. If only life (and Life) were so straightforward …. In actuality, there are a number of issues and complications in determining the failure rate of thin-dielectric X7R/X5R base-metal electrode (BME) MLCCs, and they seem to fall into two fairly broad categories: computational and non-computational. We begin with the latter.

6.1  Non-Computational Considerations in Reliability Measurements

  1. In general, there is always a tradeoff between the number of component test hours, e.g., N in (15), costs, and time.  Thus, the greater the desired estimation accuracy of cumulative percent failures in a population, the more testing, time, and cost will be incurred.
  2. Temperature constraints impose further limits: Care must be taken not to exceed the Curie point.  Above this temperature the Barium titanate crystal structure is cubic; below the Curie point, it’s tetragonal [21].  Since the Curie temperature ≈ 125 0C, any measurements above that point are problematic.
  3. The effort to construct a Weibull plot that does not assume a constant failure rate requires more elaborate instrumentation and data recording than that needed to monitor steady-state conditions, as well as more measurement and computation time.
  4. “Failure” of a capacitor is defined in a number of different ways by different standards, manufacturers, and researchers.
    1. Some researchers define it by a maximum fixed leakage current level at the applied test voltage, e.g., 100 uA or 10 mA.
    2. Others define it by a maximum change in capacitance value or change in insulation resistance (IR).
    3. References [18],[19], and [20] (MIL-PRF-55681, MIL-PRF-123, and MIL-PRF-32535, respectively) – the lattermost generally considered the standard most applicable to the capacitors discussed herein — define failure by maximum changes in a number of parameters: capacitance, IR at the life test temperature and at room temperature, visual examination, and a maximum fixed dissipation factor. While leakage current can be readily monitored continuously during a life test, the standards specifications on capacitance, IR, and DF apply only at fixed test time intervals, e.g., 250 hours, 1000 hours, 2000 hours, etc., and so exact times where failures may have occurred are not known.
    4. A number of manufacturers have failure definitions per iii, but add an additional step before initial and final measurements: The test samples have their Curie temperatures re-set, i.e., they are placed in an oven for a certain time at a specified temperature, e.g., 150 0C, then removed and permitted to cool for a certain period, e.g., 24 or 48 hours, and only then are the measurements cited performed. Thus, these results may not reflect actual behavior under operating conditions, where no Curie-temperature re-sets apply.
  5. As noted in Section 1.0 herein, there is more than one failure mode in capacitors of this type. Fig. 9, taken from [16], illustrates this. The two failure modes, “slow degradation” and “catastrophic,” result from the device structure and various physical phenomena: oxygen vacancy motion; dielectric defects, voids and discontinuities; metallization surface roughness and voids; particulates; and grain size relative to dielectric thickness. Regarding failure time dependence on applied voltage, [16] states that the catastrophic mode is best characterized by the power-law relation of (10) herein, while the slow degradation mode is best described by an exponential relation, and a formula using both is offered as the most complete descriptor (eq. 15 therein).

    Fig. 9 Failure modes in thin-layer, base-metal electrode MLCCs
    But nature is not so definitive: Depending on where the failure level is set, the modes may not be distinguishable. Consider, for example, in Fig. 9, if the failure criterion had been set at 25 uA instead of 100 uA: The leakage current patterns would’ve been the same and the “slow degradation” would’ve occurred earlier than the catastrophic one. If the failure criterion had been set at 175 uA, both failure types would look catastrophic.

    Further, as noted in Section 1.0, the modes almost surely couple to one another; that is, the various physical defects facilitate the transport of oxygen vacancies. Thus, the idea that the two failure modes can be distinguished by Weibull plots of different slopes and shape parameters is problematic.

6.2 Computational Approaches and Issues in Reliability Determination.

We begin with some definitions of terms that apply to the computational processes.

Censored data types. From [26],

Censored data is any data for which we do not know the exact event time. There are three types of censored data; right censored, left censored, and interval censored. Data for which the exact event time is known is referred to as complete data.

Referring to the tests described in 5.0 herein, our wear-out region HALT data is of two types: complete for those parts that failed (since we know the exact failure times), but right censored, i.e., terminated at a particular time, for those that survived. Those parts that have not failed are referred to as “suspensions,” and are treated differently from the failures in the statistical analysis.

From [26], interval censored data is when the exact failure times are not known, but the upper and lower bounds of an interval surrounding the failure are known. Therefore, referring to the steady-state HALT protocol, the lower bound is at the beginning of the test, but the exact failure times – if any – are not known before the test is simply terminated at the upper time bound. Note that even if the beginning of the interval is at time zero, interval censored data differs from right censored data in that the latter pertains to parts that survived, whereas the former applies to parts that failed at unknown times.

Types of Confidence Bounds. From [22]:

A “Type 1” cumulative failure percentage (CFP) plot presents confidence bounds on time at designated cumulative failure percentages. A “Type 2” CFP plot presents confidence bounds on failure percentages at designated times. Fig. 10 from [22] illustrates the two plot types.

Fig. 10 – Type 1 and type 2 Cumulative Failure Percentage plots

Maximum Likelihood Estimation – Basic Methodology.

Suppose we assume that a given distribution expression, say Weibull or log normal, describes (can be fit to) a series of lifetime data points.  Maximum likelihood estimation is a method of finding the most “likely” values of the distribution parameters.  It does this by maximizing a likelihood function comprised of three types of terms created by the following rules, depending on whether the data is complete (known failure time), suspended (right censored, i.e., no failure, test simply stopped at a given time), or interval censored (there are failures, and known upper and lower times surrounding them, but the exact failure times are unknown):

  • Since the 2-parameter Weibull pdf is given by
    Complete data points – Substitute each known time to failure, , in the expression for the probability density function (pdf), and then take the product of the resulting terms.  Thus, consider the pdf is a function of  and θ1, θ2, θ3 … θk, the parameters to be estimated.  E.g., for the two-parameter Weibull distribution, θ1 = η and θ2 = β.  Then the likelihood terms for the data points would be

    (20)


    Since the 2-parameter Weibull pdf is given by

    (21)


    each failure time would be substituted in (21) and all the resulting terms multiplied together to arrive at the product in (20). The result is,

    (22)

  • Data suspension points – Substitute each data suspension point (those parts that didn’t fail before testing was terminated) into a cumulative distribution function (cdf). The cdf of the 2-parameter Weibull distribution for surviving parts is, from (2),

    (23)
    Whence, the corresponding likelihood terms are given by

    (24)

     

    Thus, the termination times, , of the testing (usually – but not always — the same for all samples) would be substituted for and then the resulting terms multiplied together. Again, for the 2-parameter Weibull distribution, this yields

    (25)

  • Interval censored points – Substitute the upper and lower times of the intervals, and , in the cdfs of the failing parts ( for the Weibull distribution), i.e.,

    (26)

     

    which, for the two-parameter Weibull distribution becomes,

    (27)



    (28)

  • The total likelihood function for an assumed Weibull distribution is then(29)
  • Next, the natural logarithm is taken of , which converts the products into sums, ln . The logarithm has its maximum at the same point as the original function and the sum is more accurate to work with mathematically than the product. (The latter would involve comparison of extremely small numbers.)
  • Then, the partial derivatives of with respect to each are taken and set to zero; i.e., This results in a number of equations with an equal number of unknowns, which can be solved simultaneously, although numerical techniques may be necessary where there are no closed-form solutions for the partial derivatives.

A visual representation of the likelihood function surface is presented in [8] for a particular two-parameter Weibull distribution and reproduced herein in Fig. 11.

A visual representation of the likelihood function surface is presented in [8] for a particular two-parameter Weibull distribution and reproduced herein in Fig. 11.</p>
<p>

Fig. 11 Likelihood surface

The maximum likelihood estimate occurs at the parameter values on the x-and y-axes corresponding to the peak. If we imagine taking cross-sections parallel to the eta-beta plane, we can envision various contours at different elevations. An example of one such “contour-plot” is shown in Fig. 12.

Fig. 11  Likelihood surface</p>
<p>The maximum likelihood estimate occurs at the parameter values on the x-and y-axes corresponding to the peak.  If we imagine taking cross-sections parallel to the eta-beta plane, we can envision various contours at different elevations.  An example of one such “contour-plot” is shown in Fig. 12.
Fig. 12 An eta-beta contour plot

The elevation of the contour surface (eta-beta plane) is determined by the confidence level. Values βU and βL are the upper and lower confidence limits for β, while ηu and ηL are the corresponding limits for η.

Per [9] and [23], the boundaries of the contour plot represent the extreme values of the parameters that satisfy the equality in the likelihood ratio equation

(30)

where = likelihood function for the unknown parameters and . (Recall, in

our 2-parameter Weibull distribution, θ1 = η and θ2 = β.)

= estimated values that yield , the likelihood function

maximum (see Fig. 12).

= maximum of the likelihood function calculated at

= the chi-squared statistic with probability α (= % confidence level/100) and
k degrees of freedom, where k is the number of quantities jointly
estimated (e.g., k =2, if η and β are being jointly estimated)

Note, no derivation has been offered for (30) – it’s (ugh! again) one of those subjects “beyond the scope of the present paper” — but [33] and [34] are references on the subject.

Following [9], for the two parameter (η and β) extreme-value case, we can rearrange (30) to be:

(31)

Then, for a given confidence, α, the term on the right of the equality is known, and we can find (numerically) the values of η and β that satisfy the equation … and produce therefrom a contour plot. (Again, we’ve omitted the step of taking the log of the likelihood function, as would be necessary in more complicated, multi-data-point instances.)

MLE Type 1 and Type 2 Confidence Limits on Cumulative Failure Percentages

The above procedure permits us to determine the maximum likely (best-fit) values of β

and η — and , respectively – and, using these values in (2) and (6), to arrive at the unreliability (cumulative failure percentages, CFPs) vs. time. But, although we have a method of determining the confidence limits on η and β, we’ve not yet derived the two types – see Fig. 11 — of confidence limits on the CFPs. To do so, we must make the latter, or, equivalently, the reliability (R =100% – CFP), a parameter of the likelihood function. That is, we need to re-write the likelihood function in terms of one parameter and time, and then employ the equivalent of (32) to determine the confidence limits.
Specifically:

  • For Type 1 CFP, confidence bounds on time at a designated cumulative failure percentage, we first solve (23) for η,

    (32)

    And then substitute this expression in the likelihood equations (20) – (29), such that 

    (33)

    Note that: (1) in this Type 1 case, R is considered a known, specified variable; and (2) we’re not considering any interval-censoring, i.e., = 1 in (29).
    Then following the path that led to (31),

    (34)

    And we look for value pairs of and β that satisfy the equation. The difference between the maximum and minimum values, TU and TL respectively, then determines the limits on time at a given reliability – or CFP – and confidence level, α.

  • For Type 2 CFP, confidence bounds on CFP at a designated time, the same substitution for η is made as in (32), but now t is considered a known, specified variable, which leads to
    (35)

    And we look for and β value pairs that satisfy the equation. The difference between the maximum and minimum values, RU and RL respectively, then determines the limits on reliability or CFP at a given confidence level, α.

Maximum Likelihood Estimation from Accelerated Data

The above are procedures that enable best-fit of the η and β parameters or Types 1 and 2 CFP parameters to a series of data points taken at a particular temperature and applied voltage. Experimentally, however, we have a more complicated situation:

  • To improve confidence and avoid impractically lengthy test times, we want to take data under accelerated conditions and then relate it to less stressful conditions. Specifically, failure data at voltages exceeding the WVDC and at temperatures exceeding the rated constant-use maximum must be related to failures under recommended (or customer-requested) operating conditions.
  • For example, one goal might be: From accelerated test conditions, predict the shortest time at which a cumulative failure percentage is likely to occur under particular operating conditions, e.g., “BX10% failures occur, with a 90% confidence, at ≥ 7 years if the device temperature is 85 0C and applied voltage is 6.3 Volts.”

The derivation of the log likelihood function for accelerated test conditions is shown in Appendix A. The result yields , the maximally likely expression for the cumulative reliability at any given time, temperature, and applied voltage. Further calculations — not shown, using the ReliaSoft statistical software package [35] — yield the confidence bounds around the cumulative reliability (or unreliability) estimate.

Fig. 13 shows an example of the results – the best-fit plot and Type 1 90% upper and lower confidence bounds — for one lot of a particular 01005 100 nF broadband capacitor rated at 6.3V and 85 0C, but tested here at 24V and 110 0C:

Fig. 11  Likelihood surface</p>
<p>The maximum likelihood estimate occurs at the parameter values on the x-and y-axes corresponding to the peak.  If we imagine taking cross-sections parallel to the eta-beta plane, we can envision various contours at different elevations.  An example of one such “contour-plot” is shown in Fig. 12.

Fig. 13 Best-fit cumulative failure percentage with Type 1 90% upper and lower confidence bounds (dashed lines) for samples of an 01005 broadband 100 nF capacitor rated at 6.3V and 85 0C, but measured at 24V and 110 0C

Fig. 14 shows best-fit plots and Type 1 lower confidence bounds at various accelerated temperature and voltage conditions and, finally (rightmost blue plots), the best-fit and 90% lower confidence bound estimated under the maximal rated operating conditions of 6.3 Volts at 85 0C. Under the lattermost conditions, the LCL is seen to be about 30 years for a BX1% cumulative failure percentage.

Fig. 11  Likelihood surface</p>
<p>The maximum likelihood estimate occurs at the parameter values on the x-and y-axes corresponding to the peak.  If we imagine taking cross-sections parallel to the eta-beta plane, we can envision various contours at different elevations.  An example of one such “contour-plot” is shown in Fig. 12.

Fig. 14 Best-fit cumulative failure percentage plots with Type 1 lower 90% confidence bounds (dashed lines) for samples of an 01005 broadband 100 nF capacitor under various temperature and applied voltage conditions

The maximum likelihood procedures described for accelerated testing raise a number of issues:

  • The Weibull parameters discussed refer to each single segment of the unreliability curve. We may ask, for example, exactly where – at what time – does the steady-state region plot intersect the wear-out region plot, considering the confidence bounds involved, and how do we characterize the transition region.
  • It may be observed that the calculations shown assume a constant shape parameter, β, under all stress conditions. In Fig. 15, this can be seen to be at least roughly true by observing the approximately equal slopes under the various accelerated applied voltages and temperatures. But questions arise as to whether β really remains the same at the use conditions (see a), how to proceed when this is not true, and what errors are created by it being only approximate.
  • The MLE procedure we’ve discussed for calculating confidence bounds is only one of several ways to do it. Others include Beta-Binomial, Fisher matrix, and Bayesian approaches. The methods do not, in general, yield exactly the same answers – so the issue arises as to which is more accurate under a particular set of conditions.
  • A related question is how many parts must be tested, both in the steady-state and wear-out regions, to yield results that are useful to customers. As has been mentioned, most of the latter become concerned when 1% or 2% of their parts fail – which, in the steady-state can be readily calculated from the MTTF alone, but not in the wear-out region (β > 1).
  • Similarly, one might ask for how many hours should each sample be tested, keeping in mind the old adage that “time is money,” and there’s not much point in ending up with a wonderfully characterized part that no one can afford.
  • How do we compromise in determining constants based on a trial at a high voltage and temperature where all the parts fail (so the statistical curve-fitting is good) — but which is far away from actual operating conditions (so may involve mechanisms not significant at the latter) — compared to one where fewer samples fail, but which is closer to actual operating conditions?
  • At the important 1-2% cumulative failure levels, how do we – or can we – extract useful information on the steady-state/wear-out intersection time and level, not to mention (here) the different definitions of failure in the two regions?

In light of the above questions and ambiguities, it is not surprising that many manufacturers have chosen to publish life data based on constant failure rates and n and Ea values that are either undisclosed or assumed close to the PME values of Ea =1.0 eV and n =3.

6.3 Passive Plus Wear-out Region HALT – An Example

The following results were achieved on a particular lot of PPI 01005BB104MW6R3 broadband 100 nF capacitors, which are rated for maximum operation at 6.3 Volts and 85 0C. The tests had three objectives:

  • To determine the constants — G, C, n, and β (see Appendix B) — needed to derive reliability behavior in the wear-out region.
  • To determine the confidence bounds on the predicted reliability,
  • To predict reliability behavior to be extrapolated to voltages and temperatures other than those of the HALTs.

The tests were performed on groups of 20 samples at various temperatures and voltages. Voltages and temperatures were selected, in general, to produce failures within 250-300 hours. A failure was defined as a sample drawing a current ≥ 100 uA with its rated voltage applied. If all samples did not fail after 250-300 hours, the test was terminated and the remaining good parts considered as data-suspensions.

Fig. 9 depicts the setup on which the HALT tests were made. All units were contacted with spring clips to avoid the possibility of damage resulting from soldering.

The ReliaSoft software package [35] was used to calculate the G, C, n, and β constants and, finally, the BX% cumulative failure percentages at the maximum rated temperature and applied voltage. Results are shown below.

Derived Constants

C

β

G

n

3.00 x 10-6

2.10

1.45 x 104

6.76

Note, per (10), Ea = zG, and the activation energy Ea = 1.25 eV.

“Best-fit” lifetimes, 85 0C, 6.3V

Single-sided, lower 90% confidence level lifetimes, 85 0C, 6.3V

MTTF (yrs.)

BX10% (yrs.)

BX2% (yrs.)

BX1% (yrs.)

MTTF (yrs.)

BX10% (yrs.)

BX2% (yrs.)

BX1% (yrs.)

410

159

73

52

272

105

47

34

  • The BX10%, BX2%, and BX1% lives are felt to be more meaningful reliability characterizations than the MTTF when β ≠1 – see the discussion under Table 1 herein – because the MTTF corresponds to cumulative failures >43%, whereas customers become concerned at much lower failure percentages.
  • Observe how different these BME thin-layer parts are, compared to PME magnesium titanate parts: Ea (BME) = 1.25 eV, whereas Ea (PME) = 1.0 eV; n(BME) = 6.76, whereas n(PME) = 3.0. Thus, it’s no wonder that the BME parts are far more sensitive to applied voltage and temperature.

Additional Considerations. It may be observed that the wear-out region doesn’t begin until the steady-state has ended, and it can therefore be argued that the above calculated lifetimes are overly pessimistic. Nevertheless, it is thought that, in view of the ambiguities and measurement uncertainties discussed – recall the comment under Fig. 6 related to the bathtub curve: It may not be bathtub-shaped at all but rather more like a U or V, with wear-out beginning almost immediately after infant mortality – the above metrics seem appropriate for most conservative customers.

It should be noted that, for any particular product, PPI has the capability to perform testing with different parameters that can yield more precise results, e.g., larger number of samples, longer periods of time, and a particular lot measured both for wear-out and (nominally) steady-state behavior. PPI does routinely perform steady-state and additional testing on selected lots, wherein 125 samples are subjected to moderately elevated voltages and/or temperatures for 1000 hours.

7.0 COTS Product Disclaimer.

While PPI, as described, makes every effort to ensure a reliable product, it must be understood that: PPI offers these devices as commercial off-the-shelf (COTS) parts. No representation or guarantee is made or implied of a specific maximum failure rate or cumulative failure percentage.

8.0 Conclusion

Barium-titanate base-metal-electrode MLCCs with rare-earth additive core-shell structures offer a unique solution to very broadband rf coupling and bypass requirements. However, their high capacitance values, ultra-miniature sizes, extremely thin inter-electrode spacings, and basic material properties substantially affect their time to failure and can create large lot-to-lot variations in behavior. Compared to the older magnesium-titanate/palladium electrode MLCCs, their voltage and temperature acceleration constants are much larger and more variable lot-to-lot, and their failure rates increase as time passes, i.e., occur primarily in the wearout regime. Thus, to characterize the reliability of these devices, it is important to carefully screen them under a variety of voltage and temperature stresses and then to properly statistically process and interpret the results.

Appendix A – Log Likelihood Function for Accelerated Temperatures/Voltages

To derive the log likelihood function for the accelerated temperature and voltage cases, we begin with (14) from the main text (reproduced below).


(14)


where L(V,T) = some measure of life — e.g., , MTTF, BX(%) life, etc.
V = applied voltage
T = temperature
C = a constant to be determined
G and n, as defined under (10) and (12) respectively, are
parameters to be determined

We can then substitute (14) for η in the Weibull pdf and (1 – cdf) functions. Reproducing (22) from the main text and letting

(22)

And so,

(A-1)

(27)

Which yields

(A-2)

Imagine now A-1 and A-2 substituted in the log likelihood function, ln( but with each group of terms having sub-groups that differ with specific applied temperatures and voltages and subsequent failure times or suspension times.

For example, testing at three different temperatures and three different applied voltages would produce nine sub-groups each for the failures and suspensions in the log likelihood function. Following [32], is given by

where:

  • Fe is the number of groups of exact times-to-failure data points.
  • Ni is the number of times-to-failure data points in the ith time-to-failure data group.
  • β is the Weibull shape parameter (unknown, the first of four parameters to be estimated).
  • G is the second T-NT parameter (unknown, the second of four parameters to be estimated).
  • C is the third T-NT parameter (unknown, the third of four parameters to be estimated).
  • n is the fourth T-NT parameter (unknown, the fourth of four parameters to be estimated).
  • Ti is the temperature level of the ith group.
  • Vi is the applied voltage to the ith group.
  • ti is the exact failure time of the ith group.
  • S is the number of groups of suspension data points.
  • N′i is the number of suspensions in the ith group of suspension data points.
  • t′i is the running time of the ith suspension data group.

The solution (parameter estimates) is found by solving for G, C, n, and β, such that , , , and each equal 0.

When the above is completed, we have best-fit or maximally likely values for G, C, n, and β – respectively and . Substituting these in A-2, yields , the maximally likely expression for the cumulative reliability – or 1 minus the cumulative failure probability – at any given time, temperature, and applied voltage.

It remains to determine the confidence bounds on , which can be done using several methodologies, including beta binomial bounds, likelihood estimation, and Fisher matrices. PPI uses ReliaSoft software to perform the calculations.

REFERENCES

  1. J. Scarpulla, “Thin MLCC Reliability Evaluation Using an Accelerated Ramp Voltage Test,” IEEE Reliability Society, Sept. 28-30, 2016
  2. Dock Brown, “Multilayer Ceramic Capacitors: Mitigating Rising Failure Rates,” DfR Solutions Seattle, WA
  3. D. Liu, “Selection, Qualification, Inspection, and Derating of Multilayer Ceramic Capacitors with Base-Metal Electrodes,” Parts Packaging and Assembly Technologies Branch Code 562, NASA Goddard Space Flight Center, 2013
  4. P. Foeller, “Novel materials and routes for rare-earth-free BaTiO3-based ceramics for MLCC applications,” University of Sheffield Department of Materials science and Engineering, Sept., 2017
  5. A. Teverosky, “Leakage Currents in Low-Voltage PME and BME Ceramic Capacitors,” 7th International Conference on Electroceramics (ICE’15), May 13-16, 2015, State College PA.
  6. D. Liu, “Evaluation of Commercial Automotive-Grade BME Capacitors,” Capacitors and Resistors Technology Symposium (CARTS) conference, Santa Clara, California, April 1-3, 2014
  7. A. Teverovsky, 2017, “A Thermal Runaway Failure Model for Low-voltage BME Ceramic Capacitors with Defects,” ASRC Space and Defence
    Greenbelt, MD 20771, USA
  8. “Maximum Likelihood Function,” Reliability Hotwire, Issue 33, Nov. 2003, <https://www.weibull.com/hotwire/issue33/relbasics33.htm>
  9. “Contour Plots and Confidence Bounds on Parameters,” Reliability Hotwire, Issue 18, Aug. 2002, < https://www.weibull.com/hotwire/issue18/relbasics18.htm>  
  10. “Contour Plots and Confidence Bounds on Parameters (Part II),” Reliability Hotwire, Issue 19, Sept. 2002 <https://www.weibull.com/hotwire/issue19/relbasics19.htm>
  11. “The Limitations of Using the MTTF as a Reliability Specification,” ReliaSoft Resource center <https://www.reliasoft.com/resources/resource-center/the-limitations-of-using-the-mttf-as-a-reliability-specification>
  12. T. Ashburn, et al, “Highly Accelerated Testing of Capacitors for Medical Applications,” SMTA Medical Electronics Symposium, Anaheim, California, 2008
  13. Maher, et al, “Electric Field Effects on the Insulation Resistance of Various Types of BaTiO3-based X7R MLCCs at Elevated Temperatures,” The 11th US-Japan Seminar on Dielectric and Piezoelectric Ceramics, September 9-12, 2003, Sapporo, Hokkaido, JAPAN
  14. M. Randall, et al, “Lifetime Modeling of Sub 2 micron Dielectric Thickness BME MLCC,” CARTS 2003 Conference
  15. “Chi-Square Distribution,” NIST Engineering Statistics Handbook 
  16. <https://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm>

  17. D. Liu, “A General Reliability Model for Ni-BaTiO3-Based Multilayer Ceramic Capacitors,” CARTS Conference, April 1-3, 2014, Santa Clara, CA
  18. S. Jeon, “The Mechanism of Core/Shell Structure Formation During Sintering of BaTiO3-Based Ceramics,” Journal of the American Ceramic Society, Feb.24, 2012
  19. MIL-PRF-55681G, July 12, 2016, “Performance Specification, Capacitor, Chip Multiple Layer, Fixed Ceramic Dielectric, Established Reliability, General Specification For”
  20. MIL-PRF-123D Capacitors, Fixed, Ceramic Dielectric, (Temperature Stable and General Purpose), High Reliability, General Specification
  21.  MIL-PRF-32535, September 28, 2015, “Performance Specification, Capacitor, Fixed Ceramic Dielectric (Temperature stable and General Purpose), Extended Range, High Reliability and Standard Reliability, General Specification For”
  22. “What is the mechanism of the changing of the capacitance of ceramic capacitors over time?” Support, Murata Manufacturing Co., Ltd.
  23. “Differences Between Type I (Time) and Type II (Reliability) Confidence Bounds,”
  24.  Reliability Hotwire, Issue 17, July. 2002,
    <https://www.weibull.com/hotwire/issue17/relbasics17.htm>

  25. “Confidence Bounds” Reliawiki <https://help.reliasoft.com/reference/life_data_analysis/lda/confidence_bounds.html>
  26. “Why is chi square used when creating a confidence interval for the variance?” StackExchange < https://stats.stackexchange.com/questions/76444/why-is-chi-square-used-when-creating-a-confidence-interval-for-the-variance>
  27. “Chi-Squared confidence intervals” StudyPug/Statistics/Confidence Intervals 
  28. < https://www.studypug.com/statistics-help/chi-squared-confidence-intervals>

  29. “What is censored data”  Reliability, A Python library for reliability engineering
  30. <https://reliability.readthedocs.io/en/latest/What%20is%20censord%20data.html>

  31. “Chi-Squared Distribution and Reliability Demonstration Test Design (weibull.com),” Reliability Hotwire, Issue 116, October 2010
  32. <https://www.weibull.com/hotwire/issue17/relbasics17.htm>

  33. A. Gorski, “Chi-Square Probabilities are Poisson Probabilities in Disguise,” IEEE Transactions on Reliability, Vol. R-34, Aug. 1985
  34. B. Seymour, “MTTF, Failrate, Reliability, and Life Testing,” Burr-Brown Application Bulletin, Dec. 1993
  35. P. Ellerman, “Calculating Reliability using FIT & MTTF: Arrhenius HTOL Model,” Microsemi MicroNote 1002, Jan 9, 2012
  36. P. Ellerman, “Calculating Chi-squared (X2 for Reliability Equations,” Microsemi MicroNote 1003, Jan 9, 2012 
  37. “Temperature-NonThermal Relationship,” Reliawiki, Last revision 2/8/17
  38. <https://help.reliasoft.com/reference/accelerated_life_testing_data_analysis/alt/temperature-nonthermal_relationship.html >

  39. “Wilks’ theorem,” Wikipedia, <https://en.wikipedia.org/wiki/Wilks%27_theorem>
  40. “Proof of Wilks’ Theorem on LRT,” <https://www.ath.umd.edu/slud/s701.S14/WilksThm.pdf>
  41. ReliaSoft Weibull++ software suite, <https://www.reliasoft.com/about/worldwide-contacts>

PPI Academy

Unlock your engineering potential with PPI Academy's comprehensive educational resources – start exploring today!