Many statistical techniques used for the treatment of quantitative
data are sensitive to the presence of outliers. Simple calculations such as the
calculation of the mean and standard deviation of a set of data may be
distorted by even an outlying point. Checking therefore for outliers should be
a routine part of any data analysis.
A commonly used statistical test is the Dixon’s Qtest we presented
in a previous post entitled “Detection of a Single OutlierStatisticalAnalysisQuantitative Data”.
Another similar but more robust test for the detection of outliers
is the Grubb’s test. It is now considered as a more accurate test than Dixon’s Qtest.
The Grubb’s test^{1} is used to detect a single outlier in a
data set of N values that are nearly normally distributed. This test is
essentially based on the criterion of “distance of the suspected value from the
mean of the data set compared with the standard deviation”.
The test is performed by computing the Grubbs G which is defined as:
G_{exp} = x_{outlier}  x̅ / s (1)
Where:
x_{outlier} is the
suspected outlier
x̅ is the
mean of the N values
s is the
standard deviation of N values
If the calculated G_{exp} is found to be:
 G_{exp } < G then the point in question must be retained
 G_{exp } > G then the point in question must be discarded and the mean and standard deviation must be recalculated.
Where G is found from statistical tables (see Table I.1) for different
levels of confidence and number of data points.
How the Grubb’s test is applied?
The test is very simple and it is applied as follows:
 Order the N data values comprising the set of observations under examination in increasing order:
x_{1}_{ <
}x_{2 }_{< }x_{3} … < x_{N}
 Calculate the average of the data values x̅ and the standard deviation s
 Calculate the experimental G_{exp}. G_{exp} is defined in equation (1)
 The value of G_{exp }is compared with a critical value of G_{critical} found in tables. The critical value should correspond to the confidence level we have decided to run the test (usually 95% confidence).
If the calculated G_{exp} is found to be:
1)
G_{exp } < G_{critical} then the point in question must be retained
2)
G_{exp } > G_{critical} then the point in question must be discarded
and the mean and standard deviation must be recalculated.
A table containing G_{critical} values for different
confidence levels (95%, 97.5%, 99%) and
number of data N (3100) is given
below:
Table I.1: Critical values of Gtest^{1}
N

G_{critical}
(95%)**

G_{critical}
(97.5%)**

G_{critical}
(99%)**

3

1.15

1.15

1.15

4

1.46

1.48

1.49

5

1.67

1.71

1.75

6

1.82

1.89

1.94

7

1.94

2.02

2.10

8

2.03

2.13

2.22

9

2.11

2.21

2.32

10

2.18

2.29

2.41

11

2.23

2.36

2.48

12

2.29

2.41

2.55

13

2.33

2.46

2.61

14

2.37

2.51

2.66

15

2.41

2.55

2.71

16

2.44

2.59

2.75

17

2.47

2.62

2.79

18

2.50

2.65

2.82

19

2.53

2.68

2.85

20

2.56

2.71

2.88

21

2.58

2.73

2.91

22

2.60

2.76

2.94

23

2.62

2.78

2.96

24

2.64

2.80

2.99

25

2.66

2.82

3.01

30

2.75

2.91


35

2.82

2.98


40

2.87

3.04


45

2.92

3.09


50

2.96

3.13


60

3.03

3.20


70

3.09

3.26


80

3.14

3.31


90

3.18

3.35


100

3.21

3.38

** The percentage expresses the confidence level.
Are there any limitations to Grubb’s Test?
2. The Grubb’stest is valid for the detection of a single outlier (it cannot be used for a second time on the same set of data).
3.
The Grubb’s test should be applied with
caution – the same applies to all statistical tests used for rejecting data 
since there is a probability, equal to the significance level a (a = 0.05 at the
95% confidence level) that an outlier identified by the Grubb’stest actually
is not an outlier.
4.
The mean and the standard deviation s of the
values in the data set must be calculated  in cases where it is desirable to
avoid the calculation of standard deviation or where quick judgment is called
for the Dixon’s Qtest may be used instead.
A typical example with a possible outlier value was given in a previous post entitled “Calibration and
A typical example with a possible outlier value was given in a previous post entitled “Calibration and
Can we reject the 0.6400 value (please see Table I.1 in “Calibrationand Outliers  Statistical
Analysis”) as an outlier at a 95% confidence level using Grubbstest?
Analysis”) as an outlier at a 95% confidence level using Grubbstest?
By following the above procedure we get the following:
The data excluding the possible outlier are almost normally
distributed as shown in Fig. 1b
in “Calibration
and Outliers  Statistical Analysis”
Arrange the data under
examination in increasing order:
0.5980 0.5993
0.5995 0.5997 0.601 0.6400
Calculate the mean of the data values and the standard deviation:
x̅ = 0.6062, s = 0.0166
Calculate G_{exp} using
equation (1):
G_{exp} = 0.6400
– 0.6062 / 0.0166 = 2.04
Compare with the critical value of G_{critical} found in
table I.1 at the 95% confidence
level and for N = 6 observations. This value is
equal to G_{critical} = 1.82
G_{exp} = 2.04 > G_{critical}
= 1.82 and therefore we can reject 0.6400 at the 95% confidence
level being certain that there is a probability a < 0.05 that our decision is false.
level being certain that there is a probability a < 0.05 that our decision is false.
In a previous post the Dixon’s Qtest
also showed that the value 0.6400 is an outlier.
References
1.
F. E Grubbs, Technometrics, 11, 1–21, (1969)
Hello! Do you know the equation used to derive those G critical values?
ReplyDeletePlease check the original Grubbs et al. paper in Technometrics, 14, 847854 (1972) or the following reference book by Michael Thompson, Philip James Lowthian "Notes on Statistics and Data Quality for Analytical Chemists" page 135 (in Google Books)
DeleteWhat if your G value EQUALS the critical value?
ReplyDelete