All posts by Hans Liedberg

Interpolating between discrete calibration points: least squares curve fitting

ABSTRACT
In preceding publications, the accuracy of interpolation using reference functions for temperature sensors (thermocouples and platinum resistance thermometers (PRTs)), as well as a crude approach to interpolation uncertainty in the absence of any knowledge of the interpolating function, have been discussed. This article discusses how to fit curves to PRT calibration results using the linear least squares technique (implemented using matrix functions in Microsoft Excel or Libreoffice Calc), both directly to the measured data and to deviations from a reference function. (An overview of the fitting method may be found in [Numerical Recipes in Fortran 77, Chapter 15. Modeling of data].)

INTRODUCTION

In the preceding articles, we saw that
(i) industrial PRT calibration results could be interpolated “piece-wise” (between any two calibration points a few hundred degrees Celsius apart) to an accuracy of around 0.05 °C, when expressed as deviations from the Callendar-van Dusen reference functions, and,
(ii) without any knowledge of the accuracy of the interpolating function, the accuracy of interpolated values could be grossly estimated as 0.15 to 0.5 °C (for the above-mentioned spacing between calibration points).
While the latter approach can be applied generically to almost any calibration data, it has the drawback that the uncertainty of interpolated values is unrealistically large when working close to a calibration point. In the present article, the most accurate approach to the problem will be implemented, namely, to fit a curve to the complete set of calibration results. Firstly, a Callendar equation will be fitted directly to the measured data, and, secondly, a quadratic polynomial will be fitted to the deviations of the measured data from the ITS-90 PRT reference function. As the ITS-90 reference function deals with most PRT non-linearity, it is expected that the latter approach will produce the more accurate interpolation.

THE MODEL EQUATION

The Callendar equation, applicable above 0 °C, is: Rt = R0∙(1 + A∙t + B∙t^2), where Rt is the resistance (in Ω) at temperature t (in °C) and R0 is the resistance (in Ω) at the ice point (0.00 °C). Rearranging the equation, A∙t + B∙t^2 = Rt/R0 – 1, indicating that the independent variable x = t and the dependent variable y = Rt/R0 – 1.
(As the sensitivity. d(Rt/R0)/dt, will be required to convert uncertainties from temperature units, we also note that d(Rt/R0)/dt = A + 2∙B∙t. We will use the coefficients from IEC 60751, namely, A = 3.9083e-3 and B = -5.775e-7, to calculate sensitivities below.)

The measured data are as follows:

IPRT calibration data.
IPRT calibration data.

To find R0 from R(0.01 °C): R0 = R(0.01 °C) + (0.00 °C – 0.01 °C) ∙ 0.391 Ω/°C. Uncertainties are converted to the same units as y. (The 1st data point is used to determine R0, so only the four subsequent data points are used in the fit.)
IPRT data converted to appropriate units.
IPRT data converted to appropriate units.

The general model equation is c1∙f1(x) + c2∙f2(x) + … = y. For the Callendar equation, c1 = A, f1(x) = x, c2 = B and f2(x) = x^2.
The four data points lead to four simultaneous equations, namely:
c1∙f1(x1) + c2∙f2(x1) = y1
c1∙f1(x2) + c2∙f2(x2) = y2
c1∙f1(x3) + c2∙f2(x3) = y3
c1∙f1(x4) + c2∙f2(x4) = y4
Expressed in matrix notation, they are:
The unweighted simultaneous equations, expressed in matrix notation, are: A x c = y.
The unweighted simultaneous equations, expressed in matrix notation, are: A x c = y.

WEIGHTING FACTORS

Now, apply the weighting factors 1/ui to each equation (smaller std uncertainty => larger weight). The weighted simultaneous equations, expressed in matrix notation, are: W x A x c = W x y.
Weighting_matrix_W

THE SOLUTION – NORMAL EQUATIONS

The optimal (least squares) solution to this over-determined system is obtained by left-multiplying by the matrix transpose (WA)T = ATxWT, to obtain the normal equations:
Normal_equations
Matrix WTxW simply contains 1/variance at each calibration point:
Matrix_WtW
Coeff_c
Then, to find the fitted y-value for any value of x:
y_for_x
Variance_of_y
Finally, expanded uncertainty of y = 2*√variance(y): this is called the “propagated uncertainty” of y at x.

The numerical implementation of this “linear least squares” curve-fitting technique follows, using the Excel or Calc matrix functions TRANSPOSE(), MMULT() and MINVERSE():
Numerical_implementation
Now to check the fitted curve against the measured data (residual = measured – fitted):
Residuals
Are the residuals small enough, or, does the model fit the data well enough, relative to the uncertainties?: The chi-squared statistic, chi^2 = [(residual_1/u1)^2 + (residual_2/u2)^2 + …], will tell us:
Chi-squared
Chi-squared is larger than the degrees of freedom (= number of data points – number of fitted parameters = 4-2), so either the uncertainties are underestimated, or the model does not represent the data well (relative to the uncertainties). If chi^2 ≲ d.o.f., then the fit is “good enough”. (This is only strictly true if the uncertainties at different points are uncorrelated, but we will assume that this is the case.)

DEVIATION FROM ITS-90 REFERENCE FUNCTION

Perhaps direct fitting to the Callendar equation is not good enough, at this level of uncertainty. Let’s try deviations from the ITS-90 reference function – the reference function should take care of much of the PRT’s non-linearity, leaving us with smoother data to fit.

For the range 0 to 660 °C, the ITS-90 document gives the deviation function as: W-Wr = a∙(W-1) + b∙(W-1)^2 + c∙(W-1)^3, where W = Rt/Rtp and Wr is the value of the reference function.
It is good practice in curve fitting to use as few coefficients as will adequately fit the data, so we will try two, a and b.
ITS-90_dev_data
Dev_implementation
Now to check the fitted curve against the measured data (residual = measured – fitted):
Resid_dev
It can be seen that this fit is better than the previous one. (Residuals are around half the size.)
Chi2_dev
Chi-squared is smaller than the degrees of freedom, indicating that the curve fits the data well, within the uncertainties.

Below is a table of fitted values. Because of the structure of the ITS-90 functions, the uncertainty of the curve is zero at the water triple point (WTP, 0.01 °C). The uncertainty at WTP is added to the uncertainty of the curve, in the rightmost column.
Fitted_table

CONCLUSIONS

∙Both curves, fitted directly to measured PRT data, and to deviations from a reference function, represent the behaviour of the UUT better than do the individual data points. (The curves take into account all the data points, thereby potentially “smoothing out” random errors in measurement.)
∙The ITS-90 deviation function fits the data better, as is expected when using a good reference function.
∙When weighted least squares fitting is performed, the covariance matrix provides a statistically justified manner of propagating uncertainty to intermediate points, which results in small uncertainties. (To be strictly correct, we should have considered possible correlation between data points, but, as long as the dominant uncertainty component(s) are uncorrelated, our approach is reasonable.)

(Contact the author at LMC-Solutions.co.za.)

Interpolating between discrete calibration points: the effect on uncertainty – Addendum

ABSTRACT
This publication follows “Interpolating between discrete calibration points: the effect on uncertainty” of September 2025, describing a more general approach to interpolation of calibration uncertainty (not specific to a particular type of sensor), drawing a tentative conclusion regarding the spacing of calibration points versus resultant uncertainty, and applying this approach to digital thermometer, PRT and thermocouple calibration data.

INTERPOLATING UNCERTAINTY “BY EYE”

Here are two sets of calibration results, reporting correction or error of the Unit Under Test (UUT), and expanded uncertainty, at various temperatures:

Calibration results of a digital thermometer with type K thermocouple sensor, and an industrial PRT.
Calibration results of a digital thermometer with type K thermocouple sensor, and an industrial PRT.

Looking at the left-hand (digital thermometer) results, what do we observe?:
1. The correction at all three temperatures is effectively constant, relative to the calibration uncertainty.
2. The UUT is a thermocouple thermometer with a range of -50 to 300 °C. So, none of the calibration temperatures appears to be “special”. (In this context, “special” temperatures are those where the thermometer might be adjusted to have small corrections, for example, the ends of the operating range, room temperature (where all signal comes from Cold Junction Compensation and none from the sensor), and temperatures where the measuring electronics change range. For a liquid-in-glass thermometer, “special” temperatures would be scale pointing marks.)
These observations lead us to a (tentative) conclusion: The thermometer correction is stable at three “random” temperatures, so the correction at intermediate temperatures can probably be estimated with confidence, without any increase in uncertainty.

Now, looking at the right-hand (IPRT) results, we see:
1. The error varies significantly between calibration points, relative to the uncertainty.
2. The error varies fairly linearly with temperature, though deviation from a straight line is sometimes larger than the calibration uncertainty (at 232 °C, in this case):

Calibration results of the above industrial PRT, graphed.
Calibration results of the above industrial PRT, graphed.

What may we conclude from the IPRT data?: If we wish to estimate the error at an intermediate temperature by linear interpolation, the uncertainty at this intermediate temperature should probably be larger than that at the neighbouring calibration temperatures.

INTERPOLATING UNCERTAINTY – NUMERICAL ESTIMATE

How may we numerically estimate the additional uncertainty caused by interpolation?: If we assume that the value at the interpolated point lies between the two neighbouring calibration values, with an equal probability anywhere in that range, the uncertainty caused by interpolation may be estimated as |corr_1 – corr_2|, as the full-width of a rectangular distribution. (Note: The assumption that the value lies between the two neighbouring calibration values is not necessarily correct for “badly behaved” instruments.) To obtain standard uncertainty, u_interp, divide this by 2√3. Or, for expanded uncertainty, U_interp = u_interp*2 = |corr_1 – corr_2| / √3. Total uncertainty U_tot = √[U_cal^2 + U_interp^2].
This technique is applied for the digital thermometer (using a more recent, larger, data set) and the IPRT mentioned above, with values interpolated between low, middle and high temperatures (in bold) being compared to measured values at intermediate temperatures:

Linear interpolation between bold calibration results, for digital thermometer and IPRT.
Linear interpolation between bold calibration results, for digital thermometer and IPRT.

Linear interpolation between the bold calibration results differs from the measured corrections or errors at intermediate temperatures (residual = measured – interpolated) by less than the estimated total uncertainty, suggesting that this approach to interpolating uncertainty is “safe”, at least for these two instruments.
For the digital thermometer, the uncertainty of interpolated values is essentially the same as that of neighbouring calibration points (differs less than 5%), while interpolated values have significantly (~ten times) larger uncertainty for the IPRT. This seems like a reasonable approach, without having any deeper knowledge of the interpolating function (reference function) being used. (The only assumption is that the correction or error varies “more-or-less” monotonically between calibration points.) Note that interpolation may be much more accurate if one does have deeper knowledge of the interpolating function, as reported in the preceding paper, or if one performs a least squares fit to the full set of data.

The digital thermometer results above were stable, and those of the IPRT close to linear. What about non-linear results? The tables and graphs below show results for a type R thermocouple (linear) and a type K thermocouple (non-linear):

Linear interpolation between bold calibration results, for type R and type K thermocouples.
Linear interpolation between bold calibration results, for type R and type K thermocouples.

Calibration results for type R and type K thermocouples, graphed.
Calibration results for type R and type K thermocouples, graphed.

It can be seen that, though the type K’s results are non-linear (and even slightly non-monotonic) at low temperatures, the residual (= measured – interpolated) is always smaller than the total uncertainty.

SPACING BETWEEN CALIBRATION POINTS

If one wishes to add negligible uncertainty from interpolation, how far apart may the calibration points be? Consider the following “generic” calibration results:

Generic calibration results; correction or error y vs independent variable x.
Generic calibration results; correction or error y vs independent variable x.

If the correction or error, y, changes by at most half the expanded uncertainty, between one calibration point and the next, then the total uncertainty at interpolated points is essentially the same as at the calibration points (differs less than 5%).

CONCLUSION

∙The correction of a UUT at a point intermediate between calibration points may be estimated in a simple manner, by interpolating linearly between the two adjacent calibration values. The uncertainty at this interpolated point may be grossly estimated by adding the difference between the two calibration values in quadrature to the calibration uncertainty. For UUTs with stable corrections, this adds negligibly to the uncertainty, but if the correction varies significantly between calibration points, the resultant uncertainty at the intermediate point is much larger.
∙If the user specifies the calibration points, he takes responsibility for the uncertainty between points [Petersen, “Principles for Calibration Point Selection”, NCSLI Measure, Volume 8, No 3, 2013].
∙If the user only specifies the range, the calibration laboratory should suggest calibration points based on
– instrument manufacturer’s recommendation
– understanding of the operating principles of the instrument
– historical experience with this type or model of instrument (“type testing”)
– in the absence of further information, sufficient points that the UUT correction changes by at most half the expanded uncertainty, between one calibration point and the next (unless some uncertainty is added for interpolation)
– ideally ,the calibration laboratory should be confident enough to include a statement in the certificate such as “Effects due to interpolation are considered negligible over the calibrated range.”
∙The user should agree to the suggested calibration points during contract review.

(Contact the author at LMC-Solutions.co.za.)

Interlaboratory comparisons (Proficiency Testing) among calibration laboratories – how to choose the assigned value (Reference Value) – Addendum

ABSTRACT
This publication follows “Interlaboratory comparisons (Proficiency Testing) among calibration laboratories – how to choose the assigned value (Reference Value)” of October 2025, describing additional techniques for visualizing and interpreting results. Participants’ reported probability distributions are combined for visual review, Cox’s Largest Consistent Subset approach is applied to remove outliers, and the recommendations in this and the previous paper are applied to a small Thermometry ILC with four participants.

VISUAL REVIEW OF DATA

First, we continue to discuss the infrared thermometry ILC involving Ref Lab and 13 participants, from the previous paper: The kernel density plots, that were used to examine the data for items A and D, combine normal distributions with equal standard deviations (widths) around each participant’s result. (Plots were generated using the density() function in the R programming language.) What would the plots look like, if the widths were derived from the participants’ estimated uncertainties (which vary by an order of magnitude)? And, what would be the effect of including Ref Lab in the combination (convolution) of normal distributions?

KDE-A

Convoluted_A

For item A, using the participants’ uncertainties (second row of graphs above) tends to “smear out” the secondary modes (lower peaks) observed in the kernel density plots, suggesting that results in these areas have larger estimated uncertainties. (The effect is similar to enlarging the bandwidth of the kernel density plot.) Including the Ref Lab results in the combined distributions (third row of graphs) adds or heightens a peak in the 23, 34 and 80 °C data. As Ref Lab has the smallest uncertainties, it produces sharp, high peaks. (This is similar to how the weighted mean is dominated by the results with the smallest uncertainties.)

KDE_D

Convoluted_D

For item D, using the participants’ uncertainties removes the secondary peak for 23 °C, splits the main peak for 34 °C into two, and reduces the secondary peak for 40 °C.
For 80 °C, two peaks become three: -3 K (participants P1, P5, P6, P7), +1.5 K (P9, P11) and +3 K (P12, P13). It is interesting to observe that, if their uncertainties are small, just two participants (out of p = 10) can create a significant peak. Ref Lab’s result at 80 °C lies around halfway between the -3 K and +1.5 K peaks: does this suggest a difference in method between P1-P5-P6-P7 and P9-P11, Ref Lab applying a measurement technique halfway between these two populations? A similar split (P1-P2-P3-P5-P6-P7 vs P9-P10-P11) may be present at 40 °C, but the differences are smaller and therefore less distinct.
(It should be noted that Ref Lab and participants P1 to P8 used blackbody targets with emissivity ≈ 1.00, while P9 to P13 used flat plates with ε ≈ 0.95. The latter results are corrected to ε ≈ 1.00 (corrections ~ 0.7 K around 37 °C and 2.5 K at 80 °C), but perhaps some emissivity-related effects remain.)

A general observation, regarding the convolution of reported results (with widely differing uncertainties) vs the use of a kernel density plot (with equal bandwidth for all results): In practice, calibration laboratories using similar measurement standards and equipment (and therefore having similar measurement capabilities in reality) may report very different uncertainties, because, for example, some accredited CMCs are conservative and others are not. For this reason, it is suggested that reported uncertainties are of little value in visualizing the distribution of results, and a kernel density plot (with equal bandwidth for all results) is recommended.

LARGEST CONSISTENT SUBSET OF RESULTS

The above distributions of results are mostly multi-modal (having several local maxima) and asymmetric. Would the removal of “inconsistent” results improve the situation? (Our ideal combined distribution would be symmetric, with only one peak.) [Cox, “The evaluation of key comparison data: determining the largest consistent subset”, 2005] proposes a chi-squared test for consistency between the Weighted Mean (assigned value, y) and the participants’ results, x_i, with the “worst” results (largest contributors to chi-squared) being removed one-by-one, until the observed chi-squared value “passes” and we are left with the “Largest Consistent Subset” (or, LCS):
Cox_chi-squared_test
(The threshold value of Χ^2 may be calculated using the spreadsheet function CHISQ.INV.RT(0.05,N-1), where N = number of participants contributing to RV, or the Largest Consistent Subset may be directly determined using the LCS() function in the metRology library of the R programming language.)
Conv_and_Ref_A_LCS_
For item A, only the 80 °C results are inconsistent (according to Cox’s criterion), and the removal of inconsistent results to find the “Largest Consistent Subset” (second row of graphs above) does not, unfortunately, create a unimodal or symmetric plot.
Conv_and_Ref_D_LCS
For item D, all but the 23 °C results are inconsistent. Removing inconsistent results does improve the appearance of the combined probability distributions, though some asymmetry remains.

The LCS approach to finding an assigned value (or Key Comparison Reference Value, KCRV, for a Key Comparison between National Metrology Institutes) is intended to produce the best estimate of the SI value of the measurand [Cox, “The evaluation of key comparison data: determining the largest consistent subset”, Metrologia 44 (2007) 187–200]. This is seldom the goal for an interlaboratory comparison (ILC) between commercial calibration laboratories, where each participant simply aims to demonstrate his equivalence to a “credible” Reference Value. Bearing this in mind, and considering that the number of participants in a typical calibration ILC is small (7 or less, according to [EA-4/21 INF: 2018, “Guidelines for the assessment of the appropriateness of small interlaboratory comparison within the process of laboratory accreditation”]), the removal of participants from RV using a chi-squared test is not always reasonable: the reported uncertainties used to calculate chi-squared may be unrealistic, and the resultant number of results contributing to RV may be too small for statistical confidence. Instead, it is suggested that commercial calibration laboratories arranging a small ILC use a “consensus value from participant results” [ISO 13528:2015 section 7.7], using “a subset of participants determined to be reliable, by some pre-defined criteria … on the basis of prior performance” [13528 clause 7.7.1.1], with this “prior performance” being their reputation in the relevant field. An example of such a small ILC is presented below.

EXAMPLE: A SMALL INFRARED THERMOMETRY ILC

In this example, one infrared thermometer was calibrated from -20 to 150 °C. The pilot laboratory invited the other three laboratories to participate based on their reputation for competence in the field, so that all four laboratories might contribute to RV. Viewing conditions were specified (100 mm from a 50 mm diameter target, or 300 mm from a 150 mm target, etc), so that differing Size-of-Source Effect would not render the results incomparable. Three laboratories submitted multiple result sets (measured by different metrologists), but selected one set to contribute to RV, as required by the protocol. The results are presented below, in the sequence in which they were measured, with secondary results from relevant laboratories being identified as “x.2″:
IR-2025-05_results

VISUAL REVIEW OF DATA
The results are plotted, relative to the mean correction, below:
IR-2025-05_symmetry_about_mean_incl_participant_2
The expected tight grouping of results within a laboratory may be clearly seen in laboratories A, C and D. Good agreement between initial and final measurements at laboratory A (results A.1 and A.2) indicates that the thermometer was stable during the one-month circulation.
Kernel density plots of the four “primary” results (chosen for inclusion in RV) are shown below:
KDP_IR-2025-05
It is observed that these kernel density plots allow for better visual review of the data than does the graph of results relative to the mean. (There is significant asymmetry in the 80 °C and 120 °C results, which was not obvious in the graph relative to the mean.)

a) U(RV) FROM PARTICIPANT UNCERTAINTIES
Two Reference Values are considered, the mean and the weighted mean. For each laboratory, RV is calculated from the results of the other three, to avoid the laboratory being “compared to itself”. The uncertainty of the mean is estimated from reported participant uncertainties by Latex formula, similar to the formula for the standard deviation of the mean. The variance of the weighted mean is found from reported participant uncertainties by Latex formula.
As seen below, either value of U(RV) is smaller than U(LV), for all but participant B.1 at lower temperatures, so tests the capabilities of laboratories A, C and D fairly rigorously. It does not, however, meet the criterion for uncertainty of the assigned value u(x_pt), relative to the “performance evaluation criterion” σ_pt, suggested in ISO 13528 clause 9.2.1, namely
sigma_pt_vs_U_pt
(If this criterion is met, then the uncertainty of the assigned value may be considered to be negligible.)
IR-2025-05_RV
As mentioned above, the uncertainties reported by commercial calibration laboratories in ILCs differ widely, often without any technical justification. For this reason, it is recommended to use the unweighted mean as the Reference Value. However, as the goal of the ILC is to demonstrate equivalence between participants within their reported uncertainties, U(RV) is estimated from these reported uncertainties, not from the observed spread of results.

b) U(RV) FROM SPREAD OF PARTICIPANT RESULTS
The median is often used as a robust Reference Value for large ILCs, with its uncertainty being estimated from the spread of results via the scaled median absolute deviation (MADe). Is the median an appropriate Reference Value for a small ILC?
IR-2025-05_median
(Note: In the above table, the median and its uncertainty are calculated using all four results, i.e., not excluding the participant’s own result.)
It is observed that, while the median may be a reasonable Reference Value for such an ILC, the uncertainty of the median, being derived from the spread of the participants’ results, is not suitable for the task of demonstrating equivalence within reported uncertainties. To achieve this goal, U(RV) should be derived from the reported uncertainties. Also, U(median) is often larger than U(LV), so does not test participants’ capabilities rigorously.

CONCLUSION

∙Kernel density plots are recommended for visual review of data (the first step in analysis of ILC results).
∙The ILC report should present participants’ results in the order in which they were measured, so that readers may themselves look for artefact drift.
∙If a laboratory submits multiple result sets, it should choose one set to contribute to a consensus Reference Value, before having sight of other participants’ results.
∙The ILC report should indicate which result sets are from the same laboratory, so that readers may interpret data clustering correctly.
∙For small ILCs, each participant should be evaluated against a Reference Value derived from other participants’ results (to avoid bias).
∙To promote the credibility of RV, a subset of participants may be pre-selected to contribute to it, on the basis of “prior performance” (reputation in the field).
∙If a consensus Reference Value is to be used, the mean is recommended, rather than the weighted mean, as reported uncertainties often vary widely, without technical justification.
∙The uncertainty of a consensus Reference Value, U(RV), should be determined from participants’ reported uncertainties, rather than from the spread of results, to achieve the goal of demonstrating equivalence within reported uncertainties.
∙Technical assessors should ask: Is RV credible? (Is it composed of reputable labs?) Does it omit the laboratory being tested, especially if the number of participants is small? Is U(RV) < U(LV), in order to test the participant’s capability rigorously? Contact the author at LMC-Solutions.co.za.

Interlaboratory comparisons (Proficiency Testing) among calibration laboratories – how to choose the assigned value (Reference Value)

ABSTRACT
Calibration laboratories routinely participate in interlaboratory comparisons (ILCs), also referred to as Proficiency Testing (PT), in which their measurement of an item should agree with the best estimate of its “actual” or “true” value within pre-established criteria. The PT provider (or participants) must decide how to estimate this “assigned value”, or “Reference Value” (RV), as well as its uncertainty, U(RV), before performing the ILC. Obtaining RV from a superior external (non-participant) laboratory is ideal, but circumstances may require RV to be estimated from participants’ results (consensus value). In the latter case, RV may be estimated from the mean, weighted mean or median of participant results, and, equally importantly, U(RV) may be estimated from prior experience, from participants’ calculated uncertainties or from the spread of participant results. Outliers may affect a consensus value significantly, particularly when there are few participants (small ILC) – robust estimators aim to reduce such effects. This paper discusses approaches to estimating RV and U(RV), for PT in the field of calibration, and particularly for small ILCs. Examples from Thermometry are presented.

INTRODUCTION

Accredited calibration laboratories are required to participate in Proficiency Testing (PT) / interlaboratory comparisons (ILC), to monitor the validity of their results [ILAC-P9:01/2024, “ILAC Policy for Participation in Proficiency Testing Activities”]. This involves measuring an item (for example, the mass of a weight, or the correction of a digital thermometer at certain temperatures) and then comparing the laboratory’s result to the item’s “assigned value” (Latex formula, also called Reference Value, RV), which should represent the “true” value (traceable to the relevant SI unit(s)), within a particular uncertainty (Latex formula or U(RV)). The goal is to demonstrate “equivalence to the SI”, within a certain uncertainty. In practice, this often becomes “equivalence to some reputable laboratories performing such calibrations”, or “equivalence to some other laboratories performing such calibrations”.

In which circumstances would “equivalence to some other laboratories performing such calibrations” constitute evidence of “equivalence to the SI”? If the laboratories use independent measurement standards, equipment, methods and personnel, the chances of their results agreeing (within their uncertainties) while being in error (by more than their uncertainties) should hopefully be small. Note: Laboratories may make similar, large errors and thereby invalidate this argument: examples are – not applying isotopic correction to water triple point temperature [CCT-K7.2002], using inadequate immersion depth for a short thermometer, or using too small a target size for an infrared radiation thermometer. The greater the variety of equipment and methods used, the smaller the chance of such errors affecting all participants equally and thus going unnoticed. Though calibration labs generally use similar, generic calibration methods, differences in, for example, stabilisation time allowed, medium used to compare thermometers (liquid or air bath), or distances used to estimate Size of Source Effect of an infrared thermometer, may cause differences between participant results that allow errors to be identified.
As multiple PT participants within one laboratory usually share the same measurement standards, equipment and method, their results often cluster together more closely than results from different laboratories. For this reason, it adds greatly to the understanding of the result set, if it can be seen which participants are from the same laboratory. This can be done, while maintaining confidentiality, by using participant codes such as “A.1, A.2, A.3″ from laboratory A, “B.1, B.2″ from laboratory B, etc.

In the following sections, the determination of RV and U(RV) using various approaches will be discussed. References to paragraph numbers from ISO 13528:2015 [ISO 13528:2015, “Statistical methods for use in proficiency testing by interlaboratory comparison”] are included for convenience.

7.5 RESULT FROM ONE LABORATORY

The simplest choice of RV is the result of a better laboratory (often a National Metrology Institute, or NMI), not participating in the ILC, that achieves a smaller calibration uncertainty than any participant. Then, U(RV) may be simply the expanded uncertainty reported by this laboratory, with a component for the PT item’s instability during circulation added in quadrature, if such instability is significant. (u_stab is insignificant relative to u_cal if u_stab < 1/3∙u_cal.)
Note: Whether the “superior” laboratory is accredited, or even an NMI, its result may not be flawless, and should be used with care.

7.7 CONSENSUS VALUE FROM PARTICIPANT RESULTS

A superior, non-participant result is not always available. Then, RV must be estimated from participant results (either all participants, or a subset of participants “determined to be reliable, by some pre-defined criteria, such as accreditation status” [13528 clause 7.7.1.1]).
In general, the more participants contributing to RV, the better. However, noting the observation above about clustering of results from one laboratory, it is suggested that the same number of results from each laboratory be included in RV, so as not to bias RV towards the most populous laboratory(ies). In practice, this usually means choosing one result from each laboratory to contribute to RV – this choice may be made by the laboratory itself (before having sight of the full result set).

When there are “few” participants (up to 7, according to EA-4/21 [EA-4/21 INF: 2018, “Guidelines for the assessment of the appropriateness of small interlaboratory comparison within the process of laboratory accreditation”], or fewer than 30, according to the IUPAC/CITAC Guide [IUPAC/CITAC Guide: Selection and use of proficiency testing schemes for a limited number of participants – chemical analytical laboratories (IUPAC Technical Report), Pure Appl. Chem., Vol. 82, No. 5, pp. 1099–1135, 2010]), additional care is required in the choice of RV and U(RV): outliers may be difficult to detect, and inclusion or exclusion of one result may significantly change RV.
For an ILC with few participants, but using a consensus RV, it is suggested that each participant’s result be compared to a Reference Value that excludes its own result. In this way, the participant does not unreasonably bias RV “towards itself”. (In the extreme example of a bilateral comparison, if participant A’s result is (0.00 ± 0.45) and participant B’s result is (1.00 ± 0.45), it is clear that the results do not agree. However, if RV is chosen as the mean of A and B, each participant’s difference from the Reference Value, |LV-RV|, becomes 0.5 and both results pass with |En| < 1!) In other words, for laboratory A, RV should be calculated from the results of laboratories B, C and D (if four participating laboratories), etc. This approach has the drawback that RV is different for each participant, but has the larger benefit that significant bias is removed from RV. (As mentioned in 13528 clause 5.4.1, “a participant could be evaluated against an inappropriate comparison group”.)

a) U(RV) FROM PARTICIPANT UNCERTAINTIES
As a calibration result always includes an estimated uncertainty (which is not the case for many testing results), the uncertainty of a consensus value may be estimated from the uncertainty of each contributing result. If reported uncertainties are considered reliable, the contribution of each result to the average may be weighted by its variance (square of standard uncertainty σ) – the result is the weighted mean, Latex formula, which would then be RV. The variance of the weighted mean is given by Latex formula, and U(RV) = Latex formula. The benefit of this approach is that, if participant uncertainties are reliable, they are taken into account appropriately in forming both RV and U(RV). (The approaches in b), below, do not take estimated uncertainties into account at all, only the spread of results.) The drawback of the weighted mean is that U(RV) does not include any factors arising from instability/drift of the ILC item (which would only become apparent during circulation of the item).

b) U(RV) FROM SPREAD OF PARTICIPANT RESULTS
If participant uncertainties are not reliable, the simple (unweighted) mean may be used as RV, and the Experimental Standard Deviation of the Mean (ESDM), multiplied by Student’s t-factor (to expand to 95% level of confidence while compensating for limited degrees of freedom), used as U(RV).

However, an estimator that is more robust against outliers is the median [13528 clause C.2.1], which is the middle value when participant results Latex formula are sorted in increasing order. Its standard deviation may be estimated by the scaled median absolute deviation (MADe), which is calculated as follows [13528 clause C.2.2]:
MADe
This “robust standard deviation” s* is converted to standard uncertainty of the assigned value, Latex formula, as follows [13528 clause 7.7.3]:
u_xpt
where p is the number of participants contributing to RV. This may then be expanded to 95% level of confidence by multiplying by Student’s t-factor for degrees of freedom p-1.

EXAMPLE: AN INFRARED THERMOMETRY ILC

The tables below show corrections for four infrared thermometers (A to D) at five temperatures (from 23 to 80 °C), reported by a superior reference laboratory (Ref Lab) and by 13 ILC participants (P1 to P13). As recommended in 13528 clause 6.4, graphs are presented for visual review of data. (In this case, kernel density plots [13528 clause 10.3] are prepared for items A and D.) Participant results are compared to various assigned values (Ref Lab, mean, weighted mean and median), and participant performance evaluated using En scores [13528 clause 9.7]. The results and choices of RV are discussed below.

R1

R2

R3

R4

ILC of calibrations of infrared thermometers A, B, C and D.
ILC of calibrations of infrared thermometers A, B, C and D.

KDE_A

KDE_D

The following observations are made, regarding the above data:
∙The kernel density plots indicate that results at most temperatures are multi-modal and not symmetric, which may cause problems in data analysis, as mentioned in 13528 clause 5.3 (“Most common analysis techniques for proficiency testing assume that a set of results … will be approximately normally distributed, or at least unimodal and reasonably symmetric”). Even robust analysis techniques [13528 Annex C] assume symmetric distribution of data.
∙The spread of participant results, as represented by U(mean) or U(median), is larger than the uncertainties estimated by participants, represented by U(weighted mean), at all temperatures, but the difference is particularly large at 80 °C. This indicates that participants either neglected significant uncertainty components, or that the items drifted significantly during circulation, or both.
∙Some results of Ref Lab, particularly that for item A at 80 °C, are suspect when compared to the median of participant results. As mentioned above, even a reliable reference laboratory should be used with care.

CONCLUSION

∙Obtaining an assigned value for a PT scheme from a superior reference laboratory has the benefit of simplicity, but, as shown in a real-world example, such a Reference Value should still be checked for consistency with a consensus value calculated from participant results [13528 clause 7.8].
∙The use of participant uncertainties to estimate Latex formula, such as in the weighted mean, may under-represent certain uncertainty contributions, such as drift of the item, but is still useful to demonstrate equivalence of laboratories’ results, which is one of the purposes of Proficiency Testing.
∙Using the spread of participant results to estimate Latex formula, such as in the scaled median absolute deviation (MADe), captures effects such as drift in the item, but does not incorporate the participants’ own estimates of uncertainty into Latex formula.
∙Visual review of PT data, for example using kernel density plots, is a convenient way to obtain an overview of what may be a large and complex data set.
∙Two techniques for removing bias from a consensus value are: (i) If there are multiple participants from some laboratories, use only one result from each laboratory to contribute to Latex formula. (ii) If there are few participants, compare each participant result to a consensus value determined from the other participants’ results, i.e., excluding the participant’s own result.

Contact the author at LMC-Solutions.co.za.

Interpolating between discrete calibration points: the effect on uncertainty

Measuring instruments are often calibrated at relatively few points, but may be used at any point in the calibrated range. What is the calibration uncertainty at intermediate points? This article discusses some examples from Thermometry, using real calibration data for thermocouples and PRTs.

LIQUID-IN-GLASS THERMOMETERS

For liquid-in-glass thermometers, calibration every 50 to 100 scale divisions is considered adequate: “For thermometers not graduated above approximately 200 °C, it is generally accepted that the interval between test points should not exceed 100 scale divisions, if the corrected temperature values between the calibration points are to have an expected accuracy of approximately one-half of one scale division. If accuracies of one or two-tenths of a scale division are desired, it will be necessary to reduce the calibration interval to every 40 or 50 scale divisions. If a thermometer is graduated above 200 °C, a 40 to 50 scale division calibration interval is required to produce corrected temperature values with expected accuracies of approximately one-half of a scale division, and a 20 to 25 scale division calibration interval is necessary for expected accuracies to be approximately one or two-tenths of a scale division.” [Wise, “Liquid-in-Glass Thermometry”, NBS Monograph 150, 1976]

NOBLE-METAL THERMOCOUPLES & PRTS (ITS-90)

For “electrical” sensors such as resistance thermometers (RTDs) and thermocouples, the “quality” of the reference function (or reference tables) is often the important factor: If it represents any complex structure in the resistance or voltage vs temperature relationship well, then the deviation of a sensor from the reference function will be smooth and simple, requiring few calibration points, even over wide temperature ranges. For example, the 1993 NIST reference function for type R (Pt-Rh/Pt) thermocouples [Burns et al, NIST Monograph 175], reproduced in IEC 60584-1 of 1995 and 2013, is good enough to allow accuracies of (0.1 to 0.2) °C between calibration points several hundred degrees Celsius apart.

Dev_R1
This type R thermocouple’s calibration data, at ~200 °C intervals, deviate from a straight line by ±0.05 °C.
Dev_R2
This type R thermocouple’s calibration data, at ~200 °C intervals, deviate from a straight line by ±0.2 °C.

The ITS-90 PRT reference functions allow SPRT accuracy of 0.005 °C and IPRT accuracy of 0.02 °C between calibration points separated by several hundred degrees Celsius.

Dev_SPRT
This SPRT’s calibration data, at ~200 °C intervals, deviate from a straight line by ±0.003 °C.
Dev_IPRT
This IPRT’s calibration data, at ~200 °C intervals, deviate from a straight line by ±0.02 °C.

BASE-METAL THERMOCOUPLES & IPRTS (CALLENDAR-VAN DUSEN)

However, other sensors, such as the type K thermocouple, suffering from older (and less “refined”) reference functions and from temperature-induced drift, do not allow such accurate interpolation.

Dev_K
This type K thermocouple’s calibration data, at ~100 °C intervals, deviate from a straight line by ±0.7 °C.

The above-mentioned IPRT, with deviations calculated against the simpler Callendar-van Dusen equations, now shows larger interpolation errors, up to 0.05 °C:

Dev_IPRT_CvD
This IPRT’s calibration data vs the Callendar-van Dusen equations, at ~200 °C intervals, deviate from a straight line by up to 0.05 °C.

EXTRAPOLATION – GENERIC EXAMPLE

It is sometimes necessary to use a calibrated measuring instrument “slightly” outside its range of calibration, for example, using a thermometer at -1 °C that is calibrated from 0 to 100 °C. How far is it reasonable to extrapolate?: It depends on the type of sensor and on the range for which the sensor and associated electronics have been designed.
If either the sensor or the electronics is close to the limit of its operating range, “poor” behaviour (errors varying significantly more, and more rapidly, than in the middle of the range) may occur. We will consider the ideal case, where the behaviour of the sensor is well understood (and well modelled by the reference function being used) also beyond the point of interest, and the electronics is expected to behave consistently within and outside the calibrated range.

LS_uncert_propagation
Uncertainty at intermediate and extrapolated points, for a constant, linear or quadratic curve fitted to three calibration points.

In this idealised example, there is no correlation between the three calibration points, all of which have an uncertainty of 1 at the calibration points 0, 50 and 100 (arbitrary units). Extrapolating 10% beyond the calibration range (to -10 or 110), the uncertainty of the quadratic curve increases to 1.4, so, in ideal conditions, it may be justifiable to extrapolate 10% beyond the calibration range and estimate the uncertainty there as 1.5 times the uncertainty within the range. (This very simplified analysis predicts an increase in uncertainty of 12.5% when extrapolating by 4% of range, which is comparable to the 20% increase reported by Nakano [Nakano & Tamura, “Extrapolation of the ITS-90 down to the boiling point of nitrogen from the triple point of argon”, J Phys Conf Series, 2018] when extrapolating to -196 °C from a -189 to 0 °C SPRT calibration.)

(Contact the author at LMC-Solutions.co.za.)

Conformity to specification – calculating risk and implementing a decision rule in Excel or Calc

When reporting a measurement result, a calibration or testing laboratory may state whether or not it conforms to a certain specification. When reporting whether the result is a “pass” or a “fail”, the laboratory must clearly describe how it accounts for measurement uncertainty in making its decision. (This description of how uncertainty is treated is called a “decision rule”.)
Section 7.8.6 of ISO/IEC 17025:2017 requires the laboratory to “tak(e) into account the level of risk (such as false accept…)”. Guidance on decision rules and the associated risk to the customer may be found in ILAC-G8:09/2019 “Guidelines on Decision Rules and Statements of Conformity” and JCGM 106:2012 “Evaluation of measurement data – The role of measurement uncertainty in conformity assessment”. We will assume that the customer mostly wishes to avoid a “False Accept”, and will therefore calculate the “Probability of False Accept” (PFA) [ILAC-G8 5.2], for which some maximum acceptable value should be agreed between laboratory and customer. We will also determine which statement (“pass”, “fail”, “conditional pass” or “conditional fail”, as per ILAC-G8 4.2.3) to report for a particular guard band. These calculations will be performed using commonly available spreadsheet functions.

We will assume that measured results follow a normal (or Gaussian) distribution, with standard deviation equal to the estimated standard uncertainty, u(k=1). (Note that if the dominant uncertainty component is not normally distributed, the combined uncertainty is unlikely to be [EA-4/02 S8.3], so this assumption will be inappropriate.)
We will use the same numbers as the example in JCGM 106 7.4, so we can check our answers against theirs: result = 13.6, u(k=1) = 1.8, tolerance limits TL = 12.5 and TU = 16.3.

Spreadsheet programs such as Microsoft Excel and LibreOffice Calc include functions that operate on the “standard” normal distribution, which has a mean of 0 and a standard deviation of 1: (The probability density function below is plotted using the spreadsheet function NORM.DIST, with arguments x = -3 to 3, mean = 0, stdev = 1, Cumulate? = 0.)
std_norm_dist

However, the normal distribution we start with has a mean equal to the measured result, and a standard deviation equal to the combined standard uncertainty, u(k=1):
result_uncert

How do we scale and move our distribution, so it becomes a “standard” one, so that we may use the available spreadsheet functions? (In particular, what do the tolerance limits become, when we transform the distribution?): We subtract the result from the x-values (to centre the distribution around 0), and divide by u(k=1) (to change its “width” to 1): TL’ = (TL – result) / u = -0.61 and TU’ = (TU – result) / u = 1.50.

Now, how do we find the probability that the true value lies within the tolerance limits?: We find the area under the curve of the “standard” normal distribution, between the new tolerance limits, using the function GAUSS, which integrates from 0 to the given value. (The total area under the curve, from -∞ to ∞, is 1.00.): “Probability of True Accept” (PTA) = GAUSS(TU’) – GAUSS(TL’) = GAUSS(1.50) – GAUSS(-0.61) = 0.663 = 66.3%. (As the GAUSS function integrates from 0 to a value, it must be applied twice, to obtain the areas to the right and to the left of the mean value. The subtraction is because the value of the 2nd term is negative, while we need it to be positive.) This probability agrees with that found in JCGM 106 7.4, indicating that we probably performed the calculation correctly.
The “Probability of False Accept” (PFA) is 1.00 – PTA = 1.00 – 0.663 = 0.337 = 33.7%.

Now, we will implement a “Non-binary Statement with Guard Band”, as per ILAC-G8 4.2.3, using the spreadsheet functions IF, AND and OR to determine whether the statement of conformity should be “Pass” (result within acceptance interval, with guard bands w = rU, where U is the expanded uncertainty and r = 1 leads to PFA < 2.5%), “Fail” (result outside tolerance, with Probability of False Reject < 2.5%) or “Conditional” Pass or Fail (as per Fig 5 in ILAC-G8, reproduced below). Fig_5_ILAC-G8

By nesting several IF(test, value_if_TRUE, value_if_FALSE) functions, all four conditions are identified:
IF(OR(result + r*U < TL, result - r*U > TU), “Fail”, IF(AND(result – r*U >= TL, result + r*U <= TU), "Pass", IF(AND(result >= TL, result <= TU), “Conditional pass”, “Conditional fail”))). Contact the author at LMC-Solutions.co.za.

Comparing temperature sensors in an unstable environment: the effect of response time

When we calibrate a temperature sensor by comparing it to a reference standard thermometer, we try to keep the comparison medium (hereafter called the “environment”) stable in temperature during the period of comparison. (By “stable”, we mean that the fluctuation in the temperature of the environment is small relative to the desired uncertainty of calibration.) But, what happens if the environment temperature does vary significantly during the process of recording reference standard and unit under test (UUT) readings? This article discusses the comparison of temperature sensors with different response times in an unstable environment. It reports measurements of time constants of several types of sensors, and their comparison in a stirred water bath with large temperature fluctuations (on-off, rather than PID, control). The appropriate choice for the uncertainty contribution arising from temperature instability is then justified by an analysis of the measured data.

Temperature sensors do not respond instantaneously to changes in the temperature of the environment they are intended to measure. If a sensor is plunged into an environment, the rate at which the sensor temperature approaches that of the environment is described by Newton’s law of cooling, which leads to the equation
Latex formula     (1)
[J. V. Nicholas, D. R. White, Traceable Temperatures, 2nd ed, 2001, p 141], where Latex formula is the time since the sensor was plunged into an environment of temperature Latex formulaLatex formula is the initial temperature of the sensor, and Latex formula is the time constant.
Latex formula is the time the sensor takes to undergo ~63% of the total change in temperature. It is a function of the heat capacity, Latex formula, of the sensor and the heat transfer coefficient, Latex formula, from environment to sensor: Latex formula. It may be determined by measuring Latex formula versus Latex formula and modelling the data using equation (1).
(Note that, although the heat capacity is constant for a particular sensor, the heat transfer coefficient varies according to the environment, so that the time constant of a sensor in still air may be tens of times longer than the time constant of the same sensor in stirred water.)

Infrared radiation (non-contact) thermometers typically respond very quickly (response time < 1 s), but this article focuses on immersion probes, which are in physical contact with the environment. The sensors studied were a mineral-insulated, metal-sheathed (MIMS) platinum resistance thermometer (PRT) and type K thermocouple, an epoxy-coated NTC thermistor and a ceramic-sheathed type S thermocouple, with a brief mention of a solid-stem mercury-in-glass thermometer.

First, the time constants were determined by plunging the sensors from an ambient temperature of 15 °C into a stirred water bath controlled at 70 °C. The temperature readings of the PRT are plotted vs time, below:

PRT_T_vs_t

Equation (1) was manipulated to allow a straight line to be fitted:
Latex formula    (1a)
The linearised data and fitted straight lines are plotted, below:

PRT_ln(-dT)_vs_t

K_ln(-dT)_vs_tNTC_ln(-dT)_vs_tS_ln(-dT)_vs_t

The following time constants were found. (Time is in units of days in the straight lines fitted to the data above, therefore, so is Latex formula. It is converted to seconds in the table below.)

Sensor t0 (seconds)
6.4 mm diameter MIMS PRT 4.3
6 mm diameter MIMS type K thermocouple 2.8
Epoxy-coated NTC thermistor 4.5
6.7 mm diameter ceramic-sheathed type S thermocouple 13.0

The following points were noted:
1. The multimeter used to measure both RTD resistances and thermocouple voltages was set to a slow measurement rate (signal integration over 5 power line cycles or 0.1 s), which may have affected the measured values somewhat.
2. The longer time constant of the ceramic-sheathed type S thermocouple is expected, as the alumina ceramic is less thermally conductive than a metal sheath, there is an air gap between this sheath and the twin-bore alumina insulator holding the thermocouple wires, and these wires are probably thicker (0.4 mm) than those in, for example, the MIMS type K thermocouple. Such a high-temperature laboratory standard thermocouple construction is expected to exhibit the longest time constant of the sensors typically calibrated in a temperature calibration lab.
3. The results show fair agreement with previous measurements of Latex formula = 5.2 s and 2.9 s, for a heavier 6.4 mm MIMS PRT and a 6.25 mm solid-stem mercury-in-glass thermometer, respectively.

The water bath controller was then adjusted from PID to on-off control, resulting in a control cycle of ±0.5 °C with a period of ~140 s. The following measurements were recorded from the four sensors (vertically offset on the graph, for greater clarity):

On-off_T_vs_t

From these 58 measurements, over 10 control cycles (23 minutes), the following results were obtained. (Fluctuation in the absolute temperature reading of each sensor is labelled “abs” in the table below. The PRT was considered as the reference standard: the mean deviations of the other sensors relative to the PRT, labelled “dev” below, would be the results reported in their calibration certificates. The experimental standard deviation of the mean, or ESDM, is Latex formula, where Latex formula is the number of readings.)

PRT Type K NTC Type S
abs abs dev abs dev abs dev
Mean_all (°C): 0.65 -0.50 -1.19
Std dev (°C): 0.29 0.31 0.12 0.30 0.07 0.25 0.11
ESDM (°C): 0.04 0.04 0.02 0.04 0.01 0.03 0.01

A sample of six measurements was taken from these 58, representing a data set that might be captured during a typical calibration:

PRT Type K NTC Type S
abs abs dev abs dev abs dev
Mean_sample (°C): 0.68 -0.48 -1.15
Std dev (°C): 0.15 0.10 0.04
ESDM (°C): 0.06 0.04 0.02
Mean_smp – mean_all (°C): 0.03 0.02 0.04

The following points were noted:
1. The standard deviations of the “absolute” readings are smaller for those sensors with longer time constants (particularly, the type S thermocouple). This is as expected: the sensor response to a sinusoidally varying input is attenuated as the time constant becomes larger relative to the period of oscillation. (In other words, a sensor with more thermal inertia will exhibit less peak-to-peak variation in reading, the faster the control cycle.)
2. The mean of the 10% sample agrees with the mean of all 58 measurements (considered representative of the population) within approximately the ESDM of the sampled deviations: this level of agreement between the sample mean and the population mean suggests that the ESDM of the deviations is the appropriate value to use for the uncertainty component “temperature instability”.
3. The standard deviation of the absolute readings (~0.3 °C, as expected for a control cycle of ±0.5 °C) is about three times larger than the standard deviation of the deviations. Using the standard deviation of the absolute readings of the UUT (or, even more so, the standard deviations of the absolute readings of both reference standard and UUT) as the value(s) for “temperature instability” in the uncertainty budget would overestimate this contribution to the total uncertainty of calibration.

The period of oscillation of the environment’s temperature (~140 s) was long relative to the time constants of the sensors (3 to 13 s). In such a situation, a lag error of up to ~[(13 – 3) s * 0.01 °C/s] = 0.1 °C might exist between the slowest and fastest sensors evaluated, during a part of the control cycle when the environment temperature was monotonically falling or rising. (The rate of cooling is ~0.01 °C/s in the above data, and the rate of heating somewhat faster.) However, provided that the recorded data originate from varying points along the control cycle (not always when the reference standard is rising, for example, but equally often when it is rising and when it is falling), the fluctuation in the difference between reference standard and UUT provides an adequate estimation of the uncertainty contributed by the unstable environment. (The slow sensor will be colder than the fast sensor while the environment is heating, and hotter than the fast one while the environment is cooling: using enough measurements, taken randomly during several control cycles, the average of the differences between the slow and fast sensor will cancel out the effect of differing response time.)

Contact the author at LMC-Solutions.co.za.

Uncertainty of measurement: pressure dead-weight tester

Pressure dead-weight testers (DWTs), also known as pressure balances or piston gauges, are used as reference standards for high-accuracy calibration of hydraulic and pneumatic pressure gauges (as well as other DWTs, via the technique of cross-floating). This article estimates the uncertainty of calibration of a pressure gauge by comparison with such a DWT, identifying the uncertainty components that are typically dominant.

The principle of the dead-weight tester is that the pressure exerted by the fluid (usually oil or air) is balanced by a known weight applied to a known surface area (pressure = force / area).

The known weight is provided by calibrated masspieces, together with knowledge of local gravity. The effective area of the piston-cylinder assembly is determined by dimensional measurement, or by comparison with other pressure standards (e.g., cross-floating).

The equation for gauge pressure measured by a DWT is Latex formula.

Latex formula is the combined mass of the piston, sleeve (weight carrier) and the weights used. The uncertainty of calibration may be Latex formula (around F2 level). Gravitational acceleration, Latex formula, may be measured to better than Latex formula, in which case its uncertainty is negligible. However, if it is estimated by a formula, Latex formula may be Latex formula, which is significant. The certificate typically reports conventional masses, so the buoyancy correction Latex formula uses conventional air and weight densities (1.2 kg.m^-3 and 8000 kg.m^-3), not the actual ones. The buoyancy correction is around Latex formula: the additional correction Latex formula for deviation from conventional air density, and its uncertainty, is often negligible (Latex formula below 300 m altitude).

The surface tension correction, Latex formula, is unimportant for pneumatic systems. For hydraulic ones, the contribution to total weight may be Latex formula, important to correct for, but whose uncertainty has a negligible effect.

The effective area at zero pressure, Latex formula, has an uncertainty around Latex formula: this is usually the dominant component of total DWT uncertainty, especially at high pressures. The pressure distortion coefficient, Latex formula, is ~Latex formula per MPa, i.e., the correction goes up to Latex formula at 500 MPa. Say its uncertainty is 10% of its value, i.e., up to Latex formula. This may be significant at very high pressures, but is negligible compared to Latex formula at lower pressures. The thermal expansion coefficient, Latex formula, is ~Latex formula per °C. So, for Latex formula ~1 °C, the contribution to Latex formula is around Latex formula, which may be significant.

For pneumatic systems, where Latex formula is ~1000 times smaller than for oil, the head correction, Latex formula, is often negligible. The uncertainty in the head correction is typically dominated by Latex formula, i.e., Latex formula and Latex formula are negligible. If h~0.27 m (as in the Fluke P3830 pressure balance) and Latex formula is 0.025 m (25 mm, which is not very conservative), the effect on Latex formula at 4 MPa(g) is Latex formula, one of the two largest contributors. At higher pressures, the Latex formula term dominates and U(head) becomes less important (e.g., Latex formula contribution to Latex formula at 70 MPa(g)).

In conclusion, the following DWT uncertainty contributors may be significant, and should be included in the uncertainty budget (UB):

  • Latex formula (sensitivity coefficient = Latex formula)
  • Latex formula (sensitivity coefficient = Latex formula)
  • Latex formula (sensitivity coefficient = Latex formula)
  • for hydraulic systems, Latex formula (sensitivity coefficient = Latex formula)

The resolution, hysteresis and zero stability of the gauge being calibrated should also be included in the UB, and will often dominate the combined uncertainty (especially at low pressures).

(Contact the author at lmc-solutions.co.za.)

Control of measuring equipment – the requirements of ISO 9001

Once a measuring instrument is purchased and successfully installed in an ISO 9001 certified organization, the responsible staff member might consider his job done. However, to satisfy ISO 9001’s requirements for control, measuring equipment used for product and process verification requires attention throughout its lifespan. This article discusses what needs to be done, and how to do it thoroughly, but cost-effectively. (Note: Clause numbers below are from ISO 9001:2008.)

In section 7 (Product realization) of ISO 9001, clause 7.6 deals with “Control of monitoring and measuring equipment”. Firstly, it states the purpose of monitoring and measuring, namely, “to provide evidence of conformity of product to determined requirements”.

“Where necessary to ensure valid results, measuring equipment shall
a) be calibrated or verified, or both, at specified intervals, or prior to use, against measurement standards traceable to … national measurement standards… the organization shall assess and record the validity of the previous measuring results when the equipment is found not to conform to requirements… Records of the results of calibration and verification shall be maintained.”

Basically, the equipment user must manage his risk of making incorrect measurements (“control”).  As zero risk is impossible (risk management = “tolerable” risk), he must also be able to assess the impact when measuring equipment is found to have drifted outside required limits. As in all quality management systems (QMS), record-keeping is essential.

“Verification” means “provision of objective evidence that a given item fulfils specified requirements” [International vocabulary of metrology (VIM), clause 2.44]. In this case, the item is a measuring system that is required to be sufficiently accurate. (To be more precise, the measurement results must be traceable to a national measurement standard, with a small enough measurement uncertainty.)

The equipment user must decide the interval between calibrations or verifications, taking into account regulations, conditions of use, advice from measurement experts, etc.

A calibration or verification is “traceable” [VIM, 2.41] when it is
(i) performed by competent personnel,
(ii) according to a documented & validated procedure (work instruction, SOP),
(iii) using traceable measuring equipment,
(iv) with a proper estimation of measurement uncertainty.

(i) How are personnel proven to be competent? By suitable training records.
(ii) How is a calibration procedure validated? By data, showing the capability of the personnel, procedure and equipment to achieve the claimed measurement uncertainty. Typically, a proficiency test (“PT”) or interlaboratory comparison is carried out.
(iii) How is the measuring equipment used to perform calibration (“measurement standards”) made traceable? It is itself calibrated, by
1) a National Metrology Institute (NMI) with a suitable Calibration and Measurement Capability (CMC) for this parameter published in the BIPM Key Comparison Database (KCDB), or,
2) a calibration lab accredited to ISO 17025 by an ILAC affiliate [ILAC P10:01/2013, sections 2 & 3].
(iv) How is the uncertainty of measurement estimated? The calibration procedure usually lists the main components of uncertainty and describes the manner in which they may be estimated. These components are combined using an internationally agreed method [EA-4/02 Evaluation of the Uncertainty of Measurement in Calibration].

Note the distinction between
a) measuring equipment used for product and process verification, and
b) measuring equipment used to perform calibration (“measurement standards”).
Item b) might be, for example, a “master gauge” used to check (calibrate or verify) other gauges in the factory: This measurement standard’s traceability should satisfy ILAC requirements, including accreditation of the calibrating laboratory to ISO 17025.
Item a) is all the other gauges used in the factory: These items may be calibrated “in-house” (even if the organization is not accredited as a calibration laboratory), but the process must still satisfy requirements (i) to (iv) [SANAS TR 25, section 3.3].

How to satisfy ISO 9001’s calibration/verification requirement in a cost-effective manner:

1. If possible, consider this “continuing cost of ownership” aspect when selecting which equipment to purchase. (This applies not only to measuring equipment, but also to the systems in which they are installed.) For example, sensors with “standard” dimensions and fittings will probably fit more easily into calibration apparatus. An additional access port, allowing connection of a measurement standard next to the working measuring equipment, may allow in situ calibration, saving downtime and effort.

2. Choose measuring equipment with appropriate drift specifications: better specs than necessary will mean unjustified extra expense, while equipment with poor specs will require more frequent calibration/verification with attendant wastage of time and money. This highlights the value of understanding how the various uncertainties in system behaviour affect the time or cost of the process and the quality of the final product. For example, it is probably not useful to install a temperature sensor accurate to 0.1 °C if the temperature controller has a cycle of ±5 °C; a sensor accurate to 1 °C or 1.5 °C would probably do.

3. Consider cost and manhours when choosing external (accredited) or in-house calibration. Remember that in-house calibration requires personnel to be trained, a procedure (including estimation of measurement uncertainty) to be documented, measurement standards to be available and records to be kept. Most organizations adopt a “hybrid” approach: Master gauges, and equipment requiring complex calibration procedures, are calibrated by an external, ISO 17025-accredited, calibration laboratory. Low-accuracy, simple (“factory floor”) measuring equipment is calibrated in-house.
This approach is practical, as the organization’s own personnel need not undergo high-level metrology training, and it is cost-effective, as expensive, high-accuracy, measurement standards need not be maintained in-house, while the bulk of measuring equipment need not be submitted to costly external calibration.

An example – Calibration of Resistance Temperature Devices (RTDs):

The organization chooses to purchase one master RTD with digital display (“master thermometer”). They have it calibrated by an ISO 17025-accredited calibration laboratory, over their operating temperature range (-40 °C to 200 °C), with a calibration uncertainty of 0.1 °C. They decide to send it out for recalibration annually, as the manufacturer’s one-year drift specification for the display is 0.2 °C, and they want to use it to calibrate their working RTDs to an uncertainty of ≤ 0.3 °C.

They also procure a one litre, wide-mouth vacuum flask and an ice crusher, so that they are able to prepare an ice point from available ice cubes and tap water. After experimenting on how best to achieve this with their equipment, they prepare a procedure describing preparation of the ice point and measurement of an RTD therein. The procedure highlights sources of error which may significantly affect the result and explains how to prevent or detect such errors. It also explains how to estimate the uncertainty of calibration of the RTD.

Having received their master thermometer with its calibration certificate, they measure it at the ice point according to their procedure (and estimate the uncertainty of measurement). They then prepare a report comparing their result to that of the external, accredited calibration laboratory at 0 °C: this constitutes an interlaboratory comparison (ILC), demonstrating the measurement capability of their personnel, procedure and equipment when calibrating an RTD in an ice point. The senior staff member files this ILC report in his training records, as evidence of his competence. Junior staff members then compare their own measurement results to that of the senior staff member (intra-laboratory comparison), with the criterion for acceptance being that the results agree within the combined uncertainties. Based on this documented evidence, the organization authorises the relevant staff members to perform in-house calibration of RTDs.

The junior staff members then perform in-house calibration of the organization’s working RTDs at 0 °C every three months (the interval being chosen to adequately manage the risk of measurement error) and record the results in a controlled document, with control limits of ±0.3 °C being imposed. (In other words, a working RTD passes if it reads between -0.3 °C and +0.3 °C at the ice point.) The working RTDs are of tolerance class C, that is, at the time of manufacture they complied with the temperature/resistance tables published in IEC 60751 within ±(0.6 + 0.01∙|t|) °C, over the range (-50 to 600) °C. As RTDs falling within these limits are acceptable to the organization, the single-point calibration/verification demonstrating compliance at 0 °C (difference < 0.3 °C, with expanded uncertainty of 0.3 °C) is judged to be sufficient to control temperature measurement accuracy over the full range of operation. (The master thermometer is calibrated over this full range, in case any working RTD exhibits suspicious results and it is desired to perform further evaluation at other temperatures.)

(Contact the author at lmc-solutions.co.za.)

Uncertainty of measurement: water vapour transmission of materials

The rate at which water vapour is transmitted through a sheet material is important, for example, if the sheet is to be used to damp-proof a building. The documentary standards ASTM E96 and SANS 952-1 describe methods of evaluating water vapour transmission (WVT), using a desiccant or water to establish a humidity gradient across the sheet. The following article will briefly describe the method, and then focus on estimation of the uncertainty of the calculated WVT value.

The Desiccant Method [E96 clause 4.1] involves sealing the specimen to be tested over the mouth of an impermeable test dish, with desiccant (anhydrous calcium chloride) inside. This dish is placed inside a test chamber or enclosure, where the temperature and humidity are controlled at (38 ± 1) °C and (95 ± 5) %rh [952-1 clause 6.11.1.5], for example. The dish is briefly removed from the chamber and weighed, periodically, over a period of several days. The mass gain indicates the amount of water vapour moving through the specimen into the desiccant. Typically, three specimens are tested [E96 clause 9.1], with a fourth “dummy” (control) specimen mounted on a dish and weighed, but without any desiccant or water in the dish to maintain a humidity gradient [E96 clause 9.6]. The mass change of the dummy is subtracted from the change in each of the test specimens, to cancel out the effect of variations in temperature and air buoyancy [E96 clause 11.3]. (In practice, vigorous air circulation within the test chamber, required to keep the surrounding air uniform in temperature and humidity [E96 clause 6.2], causes the dummy to gain significant mass, too, because it was sealed at a lower ambient humidity than the 95 %rh maintained in the chamber.)

Measurements of specimen mass are performed (perhaps once every 24 hours), until approximately six consecutive changes in mass (corrected for variation in the dummy), plotted versus time, fall “on a reasonably straight line” [952-1 clause 6.11.4.2.3]. The slope of this line (equal to rate of mass change, in grams per hour) is converted to WVT in grams per square metre per 24 hours [952-1 clause 6.11.5], using the measured area of the mouth of each dish. The mean of the WVT values of the three specimens is reported to the customer.

Here is a data set, for use in our discussion below:

Time Dummy Specimen 1 Specimen 2 Specimen 3
(h) m (g) dm (g) m (g) dm (g) dm_corr (g) m (g) dm (g) dm_corr (g) m (g) dm (g) dm_corr (g)
0 163.4055 0.0000 197.6220 0.0000 0.0000 200.0488 0.0000 0.0000 190.0098 0.0000 0.0000
24 163.4061 0.0006 197.6260 0.0040 0.0034 200.0545 0.0057 0.0051 190.0167 0.0069 0.0063
48 163.4112 0.0057 197.6346 0.0126 0.0069 200.0615 0.0127 0.0070 190.0238 0.0140 0.0083
72 163.4148 0.0093 197.6408 0.0188 0.0095 200.0685 0.0197 0.0104 190.0326 0.0228 0.0135
120 163.4165 0.0110 197.6462 0.0242 0.0132 200.0753 0.0265 0.0155 190.0392 0.0294 0.0184
144 163.4182 0.0127 197.6554 0.0334 0.0207 200.0847 0.0359 0.0232 190.0471 0.0373 0.0246
Fitted slope (g/h): 0.000130 0.000145 0.000158
Area (mm^2): 3117 3117 3117
WVT (g/m^2/24h): 1.00 1.12 1.21

 

The slopes above were fitted using the LINEST spreadsheet function. Using the formula =LINEST(measured_y_values, measured_x_values, intercept, stats), with intercept=TRUE (allowing a non-zero y-intercept) and stats=TRUE (displaying not just the fitted values of slope and intercept, but also four additional rows of statistical information), for specimen 1, the following output is obtained:

0.000 130 0.000 11
0.000 013 0.001 09
0.963 0.001 59
103.10 4
0.000 26 0.000 01

The first row gives fitted slope and y-intercept, the second row standard errors (uncertainties at coverage factor k=1) of the fitted slope and intercept, the third row correlation coefficient R^2 and standard error in the y-values, the fourth row the Fisher F-statistic and the degrees of freedom (number of data points minus number of fitted parameters). We will discuss the use of some of these statistical measures below. (The fifth row is not of interest to this discussion.)

How may we estimate the uncertainty of measurement (UoM) of this WVT value? First, consider that the basic task is to determine the change in mass over the change in time, or Latex formula. The absolute accuracy of the balance used to weigh the dishes is not important, nor are the absolute times indicated by the watch: only the changes are of interest.

How can we estimate the accuracy of Latex formula? We could evaluate the sensitivity of the balance [OIML R 76-1 clause T.4.1] to small changes in a load around 200 g (in the above example). However, there is an easier way, which also takes into account any effects that vary randomly from one balance reading to the next. As we have six data points to determine only two unknowns (slope and y-intercept of the straight line), we have what is called an “over-determined system”, for which we find the “best fit” line by the method of least squares. This line typically does not pass through any of the data points, but is “as close as possible” to all (in other words, the best compromise to all available data). The differences between the measured data points and the fitted line give a quantitative estimate of random errors in the measured values. These differences are conveniently combined in the standard error of the fitted slope, namely, 0.000 013 g/h for specimen 1 above. How do we expand this standard (k=1) uncertainty, to reach a level of confidence of approximately 95%? For a very large data set, we would simply multiply u(k=1) by the coverage factor k=2. However, as our data set is small, we should enlarge the coverage factor somewhat, to compensate for our limited knowledge. The t-distribution of Student tells us what this k-value should be: the spreadsheet function =TINV(1-0.95, 4) gives the value of Student’s t-distribution for 4 degrees of freedom and 95% level of confidence, namely, 2.8. Multiplying u(k=1) = 0.000 013 g/h by 2.8, we obtain the expanded uncertainty 0.000 036 g/h. (This is what U(k=2) would have been, if we had a very large data set, or, in other words, almost infinite degreees of freedom.) So, the slope for specimen 1 is (0.000 130 ± 0.000 036) g/h. Or, as a relative uncertainty, Latex formula, or 27%. Note that this is significantly worse than the balance sensitivity of 1% of total mass gain (which would translate to 0.000 001 g/h) required in the test methods [E96 clause 6.3, 952-1 clause 6.11.4.2.2], indicating that balance sensitivity is, in this case, negligible compared to other factors causing “noise” in the mass readings. It highlights the importance of having more data points than unknowns and performing a least squares fit, as this uncertainty component would otherwise be grossly underestimated. It also shows how the standard error in the slope is far more useful than the correlation coefficient R^2, in quantifying uncertainties.

Now, how do we estimate the uncertainty in Latex formula? (In the previous paragraph, it seems we already obtained a “complete” uncertainty for the slope. However, the fitting procedure we used assumes no uncertainty in the x-values, only uncertainty in the y-values, so we had better look at the uncertainty in the time intervals, too.) The documentary standards require time intervals to be measured to an accuracy of 1% (for example, a 24 hour interval to an accuracy of 15 minutes), so, Latex formula [E96 clause 11.3, 952-1 clause 6.11.4.2.2]. We can see that the relative uncertainty in Latex formula is far smaller than that in the fitted slope, so we expect the uncertainty in slope to completely dominate the final, combined, uncertainty.

The formula for WVT is Latex formula. When parameters are combined by simple multiplication or division, the relative uncertainties may be added simply in quadrature. So, Latex formula. As Latex formula is typically smaller than 0.01, it, like Latex formula, can be neglected, so that the final relative uncertainty is Latex formula, or 27%. In other words, U(k=2) = 1.00 g/m^2/24h * 0.27 = 0.27 g/m^2/24h.

The above is the UoM for specimen 1. Specimens 2 and 3 have similar uncertainties (0.29 g/m^2/24h and 0.25 g/m^2/24h, respectively). Now, what is the uncertainty of the mean of the three specimens? If the three uncertainties are similar in magnitude, the uncertainty of the mean is Latex formula. If the uncertainties are quite different, the weighted mean (giving more weight to those specimens with smaller uncertainties) could be used: Latex formula. The uncertainty of the weighted mean is Latex formula, which also works out to be 0.16 g/m^2/24h, in the above example.