A problem often encountered while doing replicate measurements of a physical or chemical quantity is that of determining whether an outlying result is far enough away from the rest of the data to justify discarding it. In this case there is always a tendency to eliminate those outlying results and not to include them in any further calculations based on "good judgment" and "common sense.
While "good judgment" and "common sense" are valuable tools in interpreting results in quantitative analysis the rejection of possible outlying data must be based on objective criteria  statistical treatment of data.
Are there any simple statistical tests for rejecting outliers in quantitative data?
There are several simple tests that we can use to handle suspect values and to identify them as outliers at particular confidence intervals such as:
 The Dixon's Qtest (for a single outlier, data normally distributed, small data sets)
 Grubb'sTest (for a single test, small data sets, data nearly normally distributed)

TietjenMoore Test (generalization of the Grubb's Test to the case of more than one outlier,the number of outliers must be specified exactly)
 Huber's method (for multiple outliers, data roughly normally distributed)
All these tests have strong points and limitations and therefore must be used judiciously.
A typical example with a possible
outlier value was given in a previous post entitled “Calibration and Outliers  Statistical Analysis”.
The most commonly used statistical test for identifying outliers is
Dixon’s Qtest. The Qtest compares the difference between the suspected
outlier and its nearest numerical neighbor to the range of the entire data set.
How the Qtest is applied?
The test is very simple and it applied as follows:
 Order the N data values comprising the set of observations under examination in increasing order:
x_{1}_{ <
}x_{2 }_{< }x_{3} … < x_{N}
 Calculate the experimental Q (Q_{exp}). Q_{exp} is defined as follows:
Q_{exp} = (suspect
value – nearest neighbor) / (largest value – smallest value)
(1)
 The value of Q_{exp }is compared with a critical value of Q_{critical} found in tables. The critical value should correspond to the confidence level we have decided to run the test (usually 95% confidence).
If Q_{exp} > Q_{critical} then the suspect value is an outlier and it
can be rejected.
A table containing Q_{critical} for different confidence
levels (90%, 95%, 96%, 98%, 99%) and
number of data N (310) is given below:
Table I.1: Critical values of Qtest^{1}
N

Q_{critical} (90%)**

Q_{critical}
(95%)**

Q_{critical}
(96%)**

Q_{critical}
(98%)**

Q_{critical}
(99%)**

3

0.941

0.970

0.976

0.988

0.994

4

0.765

0.829

0.846

0.889

0.926

5

0.642

0.710

0.729

0.780

0.821

6

0.560

0.625

0.644

0.698

0.740

7

0.507

0.568

0.586

0.637

0.680

8

0.468

0.526

0.543

0.590

0.634

9

0.437

0.493

0.510

0.555

0.598

10

0.412

0.466

0.483

0.527

0.568

** The
percentage expresses the confidence level
Are there any limitations to Dixon’s Qtest?
1.
The data excluding the possible outlier must be normally
distributed
2.
The Qtest
is valid for the detection of a single outlier (it cannot be used for a second
time on the same set of data). Other forms of Dixon’s Qtest can be applied to
the detection of multiple outliers^{2}.
3.
The
Qtest should be applied with caution – the same applies to all statistical
tests used for rejecting data  since there is a probability, equal to the
significance level a (a =0.05 at the 95% confidence level) that an outlier
identified by the Qtest actually is not an outlier.
Can we reject the 0.6400 value at a 95% confidence level (please see
Table I.1 in “Calibration and Outliers  Statistical Analysis”) as an outlier using
Dixon’s Qtest?
By following the above procedure we get the following:
1.
The data excluding the possible outlier are almost normally
distributed as shown in Fig. 1b in “Calibration and Outliers  Statistical Analysis”
2.
Arrange the data under
examination in increasing order:
0.5980 0.5993
0.5995 0.5997 0.601 0.6400
Calculate Q_{exp} using equation (1):
Q_{exp} = (suspect
value – nearest neighbor) / (largest value – smallest value) =
= (0.6400 – 0.601) / (0.6400 – 0.5980) = 0.9285
Compare with the critical value of Q_{critical} found in
table I.1 at the 95% confidence level and for N = 6 observations. This value is
equal to Q_{critical} = 0.625.
Q_{exp} = 0.9285 >
Q_{critical} = 0.625 and therefore we can reject 0.6400 at the 95%
confidence level being certain that there is a probability a < 0.05 that our
decision is false.
An Applet for doing Qtest calculations is given on the University of Athen’s Department of Chemistry website.
References
1.
D. Harvey, “Modern Analytical Chemistry”, McGrawHill
Companies Inc., 2000
2.
D. B. Rorabacher, Anal. Chem., 63, 139–146, (1991)
3.
R.D. Brown, “Introduction to
Chemical Analysis”, McGrawHill Companies Inc.,
1982
No comments:
Post a Comment