## Lines, Sines, and Curve Fitting 8 – D'Agostino

The eyeball and quick sigma population checks in the previous post provided some confidence that the global temperature anomalies are normally distributed over the mean. But there are more formal tests, including D’Agostino normality test.

From wiki:

In statistics, D’Agostino’s K2 test is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of the sample kurtosis and skewness, and has power only against the alternatives that the distribution is skewed and/or kurtic.

http://en.wikipedia.org/wiki/D’Agostino’s_K-squared_test

D’Agostino tests the skew and kurtosis of a distribution. Failing the test indicates that the distribution is skewed or kurtic to the point that it is not normal. Passing the test is not proof positive that the distribution is in fact normal.

We’ll first take a look at three intentionally distorted distributions: a high kurtosis, a low kurtosis, and a skewed distribution. I had trouble skewing the distribution without also trigging kurtic indicators in the D’Agostino test. The results of each D’Agostino test follows the displayed distribution.

The D’Agostino test is included in a financial basics package from rmetrics.org.

dagoTest(rn5) Title: D'Agostino Normality Test Test Results: STATISTIC: Chi2 | Omnibus: 43.6439 Z3 | Skewness: -0.5324 Z4 | Kurtosis: 6.5849 P VALUE: Omnibus Test: 3.333e-10 Skewness Test: 0.5945 Kurtosis Test: 4.553e-11

dagoTest(rn6) Title: D'Agostino Normality Test Test Results: STATISTIC: Chi2 | Omnibus: 34.441 Z3 | Skewness: -0.3546 Z4 | Kurtosis: 5.8579 P VALUE: Omnibus Test: 3.321e-08 Skewness Test: 0.7229 Kurtosis Test: 4.687e-09

dagoTest(rn7) Title: D'Agostino Normality Test Test Results: STATISTIC: Chi2 | Omnibus: 41.7927 Z3 | Skewness: -5.8956 Z4 | Kurtosis: 2.6523 P VALUE: Omnibus Test: 8.41e-10 Skewness Test: 3.734e-09 Kurtosis Test: 0.007994

It appears that very low values of the p-value indicate that the distribution does not pass the given test for Omnibus (the overall D’Agostino test for ‘could be normal’) or the subcomponents for Skewness or Kurtosis.

Now we can look at the four distributions that we looked at in the previous post. Each set contains more residual data points from the mean than the one before.

1) GISTEMP annual 1970-2010

2) GISTEMP annual 1880-2010

3) GISTEMP monthly 1970-2010

4) GISTEMP monthly 1880-2010

Test Results: STATISTIC: Chi2 | Omnibus: 3.3396 Z3 | Skewness: -0.4131 Z4 | Kurtosis: -1.7801 P VALUE: Omnibus Test: 0.1883 Skewness Test: 0.6795 Kurtosis Test: 0.07505

Test Results: STATISTIC: Chi2 | Omnibus: 0.6071 Z3 | Skewness: 0.0594 Z4 | Kurtosis: -0.7769 P VALUE: Omnibus Test: 0.7382 Skewness Test: 0.9526 Kurtosis Test: 0.4372

Test Results: STATISTIC: Chi2 | Omnibus: 0.0225 Z3 | Skewness: -0.1197 Z4 | Kurtosis: 0.0904 P VALUE: Omnibus Test: 0.9888 Skewness Test: 0.9047 Kurtosis Test: 0.928

Test Results: STATISTIC: Chi2 | Omnibus: 0.3223 Z3 | Skewness: 0.032 Z4 | Kurtosis: -0.5668 P VALUE: Omnibus Test: 0.8512 Skewness Test: 0.9745 Kurtosis Test: 0.5709

The tests results do not uniformly improve with increasing data points. The monthly 1880-2010 set of residuals are not an improvement over the monthly 1970-2010. This is probably an indication that the 1940-1970 cooling trend is throwing off the distribution around a linear mean.

On the other hand, the 1880-2010 yearly is a definite improvement over the 1970-2010 yearly. Probably an improvement due to the increase in the number of points available.