A problem often encountered while doing replicate measurements of a physical or chemical quantity is that of determining whether an outlying result is far enough away from the rest of the data to justify discarding it. In this case there is always a tendency to eliminate those outlying results and not to include them in any further calculations based on "good judgment" and "common sense.
While "good judgment" and "common sense" are valuable tools in interpreting results in quantitative analysis the rejection of possible outlying data must be based on objective criteria - statistical treatment of data.
Are there any simple statistical tests for rejecting outliers in quantitative data?
There are several simple tests that we can use to handle suspect values and to identify them as outliers at particular confidence intervals such as:
- The Dixon's Q-test (for a single outlier, data normally distributed, small data sets)
- Grubb'sTest (for a single test, small data sets, data nearly normally distributed)
-
Tietjen-Moore Test (generalization of the Grubb's Test to the case of more than one outlier,the number of outliers must be specified exactly)
- Huber's method (for multiple outliers, data roughly normally distributed)
All these tests have strong points and limitations and therefore must be used judiciously.
A typical example with a possible
outlier value was given in a previous post entitled “Calibration and Outliers - Statistical Analysis”.
The most commonly used statistical test for identifying outliers is
Dixon’s Q-test. The Q-test compares the difference between the suspected
outlier and its nearest numerical neighbor to the range of the entire data set.
How the Q-test is applied?
The test is very simple and it applied as follows:
- Order the N data values comprising the set of observations under examination in increasing order:
x1 <
x2 < x3 … < xN
- Calculate the experimental Q (Qexp). Qexp is defined as follows:
Qexp = |(suspect
value – nearest neighbor) / (largest value – smallest value)|
(1)
- The value of Qexp is compared with a critical value of Qcritical found in tables. The critical value should correspond to the confidence level we have decided to run the test (usually 95% confidence).
If Qexp > Qcritical then the suspect value is an outlier and it
can be rejected.
A table containing Qcritical for different confidence
levels (90%, 95%, 96%, 98%, 99%) and
number of data N (3-10) is given below:
Table I.1: Critical values of Q-test1
N
|
Qcritical (90%)**
|
Qcritical
(95%)**
|
Qcritical
(96%)**
|
Qcritical
(98%)**
|
Qcritical
(99%)**
|
3
|
0.941
|
0.970
|
0.976
|
0.988
|
0.994
|
4
|
0.765
|
0.829
|
0.846
|
0.889
|
0.926
|
5
|
0.642
|
0.710
|
0.729
|
0.780
|
0.821
|
6
|
0.560
|
0.625
|
0.644
|
0.698
|
0.740
|
7
|
0.507
|
0.568
|
0.586
|
0.637
|
0.680
|
8
|
0.468
|
0.526
|
0.543
|
0.590
|
0.634
|
9
|
0.437
|
0.493
|
0.510
|
0.555
|
0.598
|
10
|
0.412
|
0.466
|
0.483
|
0.527
|
0.568
|
** The
percentage expresses the confidence level
Are there any limitations to Dixon’s Q-test?
1.
The data excluding the possible outlier must be normally
distributed
2.
The Q-test
is valid for the detection of a single outlier (it cannot be used for a second
time on the same set of data). Other forms of Dixon’s Q-test can be applied to
the detection of multiple outliers2.
3.
The
Q-test should be applied with caution – the same applies to all statistical
tests used for rejecting data - since there is a probability, equal to the
significance level a (a =0.05 at the 95% confidence level) that an outlier
identified by the Q-test actually is not an outlier.
Can we reject the 0.6400 value at a 95% confidence level (please see
Table I.1 in “Calibration and Outliers - Statistical Analysis”) as an outlier using
Dixon’s Q-test?
By following the above procedure we get the following:
1.
The data excluding the possible outlier are almost normally
distributed as shown in Fig. 1b in “Calibration and Outliers - Statistical Analysis”
2.
Arrange the data under
examination in increasing order:
0.5980 0.5993
0.5995 0.5997 0.601 0.6400
Calculate Qexp using equation (1):
Qexp = |(suspect
value – nearest neighbor) / (largest value – smallest value)| =
= |(0.6400 – 0.601) / (0.6400 – 0.5980)| = 0.9285
Compare with the critical value of Qcritical found in
table I.1 at the 95% confidence level and for N = 6 observations. This value is
equal to Qcritical = 0.625.
Qexp = 0.9285 >
Qcritical = 0.625 and therefore we can reject 0.6400 at the 95%
confidence level being certain that there is a probability a < 0.05 that our
decision is false.
An Applet for doing Q-test calculations is given on the University of Athen’s Department of Chemistry website.
References
1.
D. Harvey, “Modern Analytical Chemistry”, McGraw-Hill
Companies Inc., 2000
2.
D. B. Rorabacher, Anal. Chem., 63, 139–146, (1991)
3.
R.D. Brown, “Introduction to
Chemical Analysis”, McGraw-Hill Companies Inc.,
1982
I'm usually to running a blog and i actually admire your content. The article has really peaks my interest. I'm going to bookmark your site and maintain checking for brand spanking new information. online casino gambling
ReplyDelete