Home > GIStemp, LSCF, Statistics > Lines, Sines, and Curve Fitting 7 – normal

Lines, Sines, and Curve Fitting 7 – normal

2011 January 15

Zeke Hausfather has tendered a challenge to Joe Bastardi regarding future warming -v- cooling. A bet similar to the “Did Global Warming Stop …” series I ran through earlier this month.

Zeke describes a portion of the bet as follows:

The graph below shows the trend in annual (Jan-December) temperatures from 1970 to 2010, with two standard deviations of the detrended residuals around the trend to show expected confidence intervals of variability. This means that on average, we would expect only 2.5% of observations to exceed the red upper dotted line and 2.5% of observations to fall below the lower dotted in any given year. The linear trend and confidence intervals for the 1970 to 2010 data are extended up to 2030 to provide a testable projection.

I’ve been meaning to test this. Is the distribution of the detrended global temperature anomalies ‘normal‘? Which is to say, do the residuals around the OLS trend assume a gaussian distribution?

Worried about the counter-trend during 1940-1970, I’ll first just test 1970-2010 using the annualized data for GISTEMP.

GISTEMP residuals histogram

Hmmm. That does not look so good. However, there are hopeful signs, 66% of the residuals are within 1 sd of the mean. But there are 0 points out in the 2+ sd tails.

Just as a reminder, here is a normal distribution for 40 random points with a standard deviation of 0.097 (the same as the 40 year GISTEMP sd).

normal 40 random

Frankly the random normal data doesn’t look much better than the anomaly data. Just not enough points. So lets go ahead and get more points by using the full 130 years. This does broaden the standard deviation up to 0.127.

GISTEMP residuals histogram

A bit better but still pretty ragged, although the anomaly data looks more normal than the random normal data. Here again, we look at a 130 random points generated from a normal distribution for comparisons.

normal 130 random

Still not very convincing – for either of them! But the anomaly data has 67% of the points +/- 1 sd of the mean. And 96.2% points are within +/- 2 sd. The normal distribution numbers are improving slightly. We can increase the number of data points available again by using monthly data instead of annual data.

GISTEMP residuals histogram

Definitely better. Now 69% of the points are within +/- 1 sd of the mean. And 94.9% points are within +/- 2 sd. The curve is smoother. Yet, we can increase the number of data points still one more time by using the whole data range.

GISTEMP residuals histogram

Within +/- 1 sd of the mean, there are 68% of the points. And 95.2% of the points are within +/- 2 sd.

It seems that the temperature anomalies are reasonably close to a normal distribution about the mean. Gonna have to do better than “seems” and “reasonably”, though.

  1. 2011 January 15 at 10:12 pm

    For some time I have been playing with a Excel spread sheet that reconstructs HADCRUT temperature data from Cosine functions The latest version uses reconstructed TSI data as an input

    Obtaining the repetative frequencies:
    Using bandpass filters set to a bandwidth of f/150 (where f is the lower frequency of the BP filter) the filter is swept from 0.5year to 150 years looking for amplitude peaks.
    As each peak is found the output from the filter is compared to the output of a cosine wave set to the filter output amplitude and to the centre frequency of the filter. The cosine wave is is then phase shifted to alingn the two waveforms.
    This is repeateated for as many peaks as are found in the HADCRUT3V record. Outputs from the BP filters that fall nto likely TSI frequencies 10, 11, 12, 14, 21 years have their corresponding synthesized outputs nulled. 0.5 year output is also nulled.

    TSI is phase shifted, amplitude modified, then it added or subtracted from sum of cosines (shift is +48 months and is subtracted). However from the start of records to a round 1900 a lower error is provided by adding the TSI!

    The remaining synthesized outputs are then summed and modified by a multiplier constant (approx 2.69) set by minimising the error between synthesized and original HADCRUT.

    No low frequency peaks were found that could fit the rise in temperature. The rise is obtained by curve fitting the HADCRUT data with a 3rd order polynomial:
    y = 2.40389E-07x^3 – 1.34093E-03x^2 + 2.49320E+00x – 1.54550E+03

    This is added into the synthesised data to provide the output data.

    Because the reconstructed data is just the sum of cosines and TSI is relatively unimportant the data can be extended to the future in this case to about 2017.
    The plots obtained show 2011 to be cooling to mid year.


  2. 2011 January 16 at 2:27 am

    There was a discussion of normality of residuals in this Blackboard thread a while ago, and this follow-up.

    The Jarque-Bera test gives a sort of wrong-way-round statistic. It tells you if you have failed to reject the possibility that the data is normal. We really want some kind of affirmation. I don’t know of any quantification of that. The qq plots are a fancier way of eyeballing.

  3. 2011 January 16 at 7:36 am

    Thanks for the pointer Nick.A 2007 link, must have been a memorable dance. I’ll be sure to read it. Any comments on how Jarque-Bera stacks up with D’Agostino?

    I appreciate the heads up on your post as well, TFP. I recognize your approach as from signal analysis – but I am not quite ready to dive into that … yet. Although I’ve been dipping my toes into some reading. I am very interested in getting a feel for the solar forcing – since it is the fundamental climate driver.

  4. 2011 January 16 at 9:01 am

    Ron Broberg 2011 January 16 at 7:36 am

    The plots on my page show that solar forcng has very minimal effects however you fiddle with the direction or phase (in fact overlaying the plots osf residuals it is difficult to say there is any better curve fitting by using the TSI index). It seems tome that TSI forcing lies in the mush of noise remaining (generally +-0.15C in modern times)

    From the plots it is interesting to note that the 1998 and 1878 peaks are simulated.
    It is also interesting that the oddity from 1906 to 1946 is not captured.

  5. 2011 January 16 at 4:16 pm

    I got that first pointer wrong – it was this thread.

  6. 2011 January 16 at 10:53 pm

    On KB vs d’Agostino, being both tests of skewness and kurtosis they must be similar. Wiki comments:
    “Considering normal sampling, and √β1 and β2 contours, Bowman & Shenton (1975) noticed that the statistic JB will be asymptotically χ2(2)-distributed; however they also noted that “large sample sizes would doubtless be required for the χ2 approximation to hold”. Bowman and Shelton did not study the properties any further, preferring the D’Agostino’s K-squared test.

    Around 1979, Anil Bera and Carlos Jarque while working on their dissertations on regression analysis, have applied the Lagrange multiplier principle to the Pearson family of distributions to test the normality of unobserved regression residuals and found that the JB test was asymptotically optimal (although the sample size needed to “reach” the asymptotic level was quite large). In 1980 the authors published a paper (Jarque & Bera 1980),”

    This paper seems to say that JB is optimal for large samples, but d’Agostino has small sample corrections which may be right for you.

  1. No trackbacks yet.
Comments are closed.