Home > CRUTEMP, LSCF, models, Statistics > Lines, Sines, and Curve Fitting 9 – Girma

## Lines, Sines, and Curve Fitting 9 – Girma

2011 January 17

Dr G. Orssengo recently brought to my attention his “line+sine” model which was presented at WUWT in April 2009. In short, his model is

y = (m*(x-1880) + b) + (A * cos(((x-1880)/T)*(2*pi)))

intercept (1880) b = -0.53
slope m = 0.0059 C / yr
amplitude A = 0.3 C
period T = 60 years

Which he displays as such:

I’ve recharted Dr Orssengo’s graph as follows:

Dr Orssengo has commented here that his correlation is 0.88. However, using annualized HadCRUv3 data, the correlation I calculate from 1880-2009 is a bit lower at 0.87. However, just reducing the amplitude of the cosine to 0.25C increases the correlation to 0.89. I suspect that Girma fit his model by hand and did not seek to optimize the values used.

Dr Orssengo has put a lot of emphasis on the correlation of his model, and has allowed that if an equally simple model had a higher correlation that it would be a better model. I now present a simple model based on the same data with a higher correlation.

First a simple exponential trend.

y1 = -a/100 + b/100 * exp((k/10000)*(x-1880))

a <- 44.40
b <- 6.02
k <- 211

This model has a correlation of 0.89 for HadCRU, but 0.90 for NCDC, and 0.91 for GISTEMP all through 2009. Since HadCRU is the data set that Dr Orssengo used, we will judge on the correlation against it. And 0.89 is higher than the 0.88 that Girma has calculated for his model or the 0.87 that I calculated for it.

Based on the criteria that the best correlation makes the best model, my exponential warming model beats Dr Orssengo’s line+sine model.

Will the model hold … no, it will probably not. That will be a whole bunch of warming if it does!

To be fair, Dr Orssengo points out that the future is a better filter for selecting models than correlation. I’ll address that – with a slightly better model-of-fit – in a couple of days.

And one more note. While curve fitting is fun, divorced of a physical explanation, they aren’t much more than fun. What drives the linear (or exponential) trend? What drives the 60 year sine? Without knowing these, the mathematical model is a sterile construct.

1. 2011 January 17 at 7:01 am

Dance could be seen as curve fitting.

I wonder now in how many Girma’s drive-bys a link that post should be amended.

2. 2011 January 17 at 5:50 pm

The dance of regressions. You made me smile. 😀

As to Girma’s obfuscatory links: the principal of charity informs me that he is just being polite and keeping links short. The cynical me isn’t worried about driving up traffic at WUWT or increasing his google ranking – I’m just not that big.

3. 2011 January 18 at 2:12 am

“To be fair, Dr Orssengo points out that the future is a better filter for selecting models than correlation”

Thanks Ron.

In the next couple of years, whether we have milder or harsher winters will show us whether the exponential or the sinusoidal model represents the reality.

4. 2011 January 18 at 2:28 am

Ron

One very important point, your exponential model does not show the big freeze of the 1970s and the dust bowl of the 1940s. The sinusoidal model does.

The Big Freeze of the 1970s
http://bit.ly/g3dtPb

The Dust Bowl of the 1940s
http://bit.ly/hXNYAA

5. 2011 January 18 at 8:42 am

In the next couple of years, whether we have milder or harsher winters will show us whether the exponential or the sinusoidal model represents the reality.

Well. No. The integrated global temperature anomalies will tell us. Not the seasonal temperatures in one region or another.

One very important point, your exponential model does not show the big freeze of the 1970s and the dust bowl of the 1940s. The sinusoidal model does.

Well, how about a sinusoidal exponential model?

6. 2011 January 18 at 9:50 am

Blog comments as dances of regressions would explain a lot, perhaps a bit too much.

A sinusoidal exponential model makes my head aches.

7. 2011 January 18 at 10:11 am

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.

http://en.wikipedia.org/wiki/Regression_analysis

Why do I have a vision of blog-mining bots doing social network analysis on climate blogs?

8. 2011 January 18 at 11:48 am

Ron

“Well, how about a sinusoidal exponential model?”

Much better.

However, what is its correlation coefficient? It must be higher than your previous value of 0.89.

Ron, I doubt your previous value of 0.89. How can you have such a high value when the exponential model is monotonically increasing while the data is not?

9. 2011 January 18 at 12:21 pm

Girma, the results are easy to replicate.
I would be grateful if you did so in Excel.

The script which I used is located here:
http://rhinohide.org/gw/trendtester/tt-nls.R

Note that there is a line which sets which data source is being used:

> # column idx
gis <- 3
cru <- 4
noa <- 5
uah <- 7
idx <- gis

And the script output (not including graphics) is found here:
http://rhinohide.org/gw/trendtester/tt-nls.Rout

10. 2011 January 18 at 12:47 pm

Ron

http://bit.ly/c0Jvh0

11. 2011 January 18 at 1:16 pm

It looks roughly periodic. It is also missing the data for 2010.

12. 2011 January 18 at 4:30 pm

Ron

As the above pattern was valid for the past 130 years, is it not reasonable to assume it will also be valid for the next 20 years?

As a result, cooling until 2030?

13. 2011 January 18 at 4:42 pm

Ron

I have one objection with your exponential model.

Its estimate of the local maxima of 1880 and 1940 and local minima of 1910 & 1970 are poor.

Can you create a model based ONLY on the following local minima and maxima values?

1880=>-0.2
1910=>-0.6
1940=>0.1
1970=>-0.3
2000=>0.5

14. 2011 January 20 at 3:01 pm

“As the above pattern was valid for the past 130 years, is it not reasonable to assume it will also be valid for the next 20 years?”

Because we have more information than just 130 data points? Some of it is that we have spatial patterns as well as regional, which tells us something different is going on in this warming than in the 1910-1940 one. But more importantly, we have data on the physical causes of temperature changes such as solar, volcanic, aerosol, and greenhouse gas forcings.

If I watch a kid on a swing for 3 minutes, I might ask whether the next 3 minutes will have the same oscillation pattern as the first 3. But if I hear his mom calling for him to go home, I have information which suggests that perhaps the next oscillations won’t be the same after all.

Also: have you ever heard the expression, ‘with four parameters I can fit an elephant and with five I can make him wiggle his trunk,” attributed to Enrico Fermi? (ignoring the fact that it actually takes about 30 parameters to get a good line drawing of an elephant) One good test is to use your approach on the data from 1880 to 1990, and see how well your method* captures the period from 1990 to 2010. My guess is that your method will work poorly.

*Here, I define your method as “fit the data with a combination of sine curves and linear trends, and optimize until you get the very best R^2 value”.

15. 2011 January 21 at 4:13 pm

One good test is to use your approach on the data from 1880 to 1990, and see how well your method* captures the period from 1990 to 2010

Which is pretty much what was done at the following, but with 1900-2000 as the model period.
https://rhinohide.wordpress.com/2011/01/14/lines-sines-and-curve-fitting-6-backcast-and-forecast/

16. 2011 January 24 at 7:52 am

“Which is pretty much what was done at the following, but with 1900-2000 as the model period”

Yeah, that’s a nice way of looking at things. Now, there is still the danger that if you do throw 1000 methods at a given problem, then reserving parts of the time record no longer avoids overfitting… you could do a two stage “reserve” where you throw 100 models at 80% of the time period, pick the couple models which best fit the next 10%, and then check and make sure that those models that best fit the next 10% also do reasonably at the last 10%…

-M

17. 2011 January 26 at 4:15 am

@Grima
“As the above pattern was valid for the past 130 years, is it not reasonable to assume it will also be valid for the next 20 years?

As a result, cooling until 2030?”

Err, no. This data was with a linear trend removed, and if you plot the real data you can clearly see that the linear trend is stronger than the amplitude of the presumed 60 year oscillation. So even if this oscillation would continue the coming 20 years, there would be global warming, albeit with a smaller slope. And furthermore, I cannot say it better than Ron: “While curve fitting is fun, divorced of a physical explanation, they aren’t much more than fun”

18. 2011 January 26 at 12:36 pm

milanovic,

But we do have a physical explanation. The ghg forcing curve is fit very well by an exponential function and the modulation is caused by a cyclic variation in heat transfer alternately favoring the NH and the SH. Since this cyclic variation component is not modeled by GCM’s the empirical fit has more skill for the past and is likely to have more skill over the next decade or two than current GCM’s coupled or not. Even better would be to use a rational scenario for ghg emissions rather than a simple exponential. But the IPCC emission scenarios, say, don’t deviate much from an exponential curve for the next decade or two either.

When the cyclic component is included, you don’t need all the aerosol forcing fudge factors to explain the variation of global temperature in the twentieth century so the short term climate sensitivity is at the low end of the IPCC range.

19. 2011 January 26 at 5:16 pm

Hey Ron, you gifted Girma two free parameters (actually three if you include the extra functional form), but the serious question is that if you use the same baseline, what is the quality of the fit to the combination of the three surface records? or if you prefer it that way, how do the fitting parameters vary amongt them.

20. 2011 January 26 at 8:27 pm
21. 2011 January 27 at 1:14 am

“caused by a cyclic variation in heat transfer alternately favoring the NH and the SH”

What evidence can you give for a fixed cycle with period of approx 60 years? I don’t know of any. For example, PDO or ENSO have no such cycle. (see for example http://tinyurl.com/4u3g73y). But anyway, my main point was that if such a cycle existed (which I doubt), the amplitude would clearly be too small to cause global cooling in the coming decennia, as Grima suggested. If you want to put faith in this curve-fitting procedure, then the exponential + sinusoidal curve fit does not predict any cooling. Also the link he referenced himself http://bit.ly/c0Jvh0 made that clear, because the linear trend that was subtracted to get this graph is higher than the amplitude of the resulting oscillation.

22. 2011 January 27 at 1:00 pm

milanovic,

Of course it’s not fixed. The planetary system is likely chaotic or at least shows long term persistence. That type of system frequently exhibits quasi-periodic behavior at all time scales. The best evidence of a ~60 year oscillation that is occurring now is the AMO index . It won’t continue forever. But for the relatively short term, the null hypothesis would be that the currently observed behavior will continue. And I never said or implied that global average temperature will decline. But I do think it likely that the rate of increase for the next decade or two will be significantly ( in the frequentist statistical sense) less than the current GCM runs predict.

The bi-polar seesaw effect is also well known, although research emphasis is mainly on the longer term cycles like Heinrich and Dansgaard-Oeschger events. See here for example or google polar see saw.

23. 2011 January 28 at 12:56 am

@DeWitt Payne
Thanks for the reply. I was not aware of the ~60 year oscillation in AMO. Of course I meant “fixed” as quasi-fixed, we should at expect it to continue for the next cycle.
“And I never said or implied that global average temperature will decline.”
I know you didn’t, but my post was a reply to Grima’s post (18-01, 4.30pm) who did imply that. So we are more or less talking past to each other I am afraid.

24. 2011 January 28 at 6:28 am

The best evidence of a ~60 year oscillation that is occurring now is the AMO index .

I admit that I haven’t looked at it in detail, but that doesn’t seem very convincing in and of itself. That record shows essentially one long decline to a minimum, one rise and fall to another minimum, and then another rise. Since we don’t know when the pre-1900 peak was, nor the post-1990 peak, there’s really only one complete “oscillation” in the modern record, right? That’s not much to base any conclusions on regarding the behavior of that signal over the next couple of decades, IMHO.

Again, though, I haven’t read much about the AMO. Perhaps there are other reasons (aside from the record itself) for expecting it a priori to show a ~60-year oscillation.

25. 2011 January 28 at 9:06 am

Just FYI – I am pretty neutral on the 60 year oscillation stuff. Solar? Sea-based resonance? Ice-based resonance? Or just a fluke based on a 30 year cooling due to aerosols? Although, looking at the exponential fits, it is the 1910-1940 leg that is out-of-alignment. (which might raise questions of pre1940 instruments records)

26. 2011 January 28 at 10:51 am

Ned,

What we don’t know is where or whether there was a minimum prior to 1856. We have at least two full cycles, though, which is enough for a curve fit and a testable hypothesis. If we don’t see a significant decline in the AMO index in the next decade, the hypothesis that it’s cyclical with a period of 60-70 years fails. In fact, I’ll be very suspicious if it doesn’t decline significantly in the next five years. It looked like it was declining until the recent El Nino.

27. 2011 January 29 at 4:43 am

DeWitt Payne writes: We have at least two full cycles, though, which is enough for a curve fit and a testable hypothesis.

Well, maybe. I only see one definite peak (1940s-ish). I take it you’re assuming there’s another peak some time there prior to 1900, and one some time in the past decade or so, right? Any individual year can be an outlier, but it seems a bit risky to assume that we’ve seen the most recent peak when the most recent year’s value is actually the highest in the past century. Given the noise in that signal, it might be a little hard to pin down the local maximum until many years after we’ve passed it.

Frankly, trying to discern a “period” for the AMO in the 20thC data and use that to forecast temperature trends seems pretty close to reading tea leaves, IMHO. In the literature you see it referred to as a “60-80 years” period (e.g., Poore et al. 2009, who specifically state that the instrumental record isn’t long enough to clearly establish the period of the AMO cycle). If you assume a peak in the mid-1940s and a period of 60 years, then we should have already passed the second peak. If you assume a period of 80 years, the second peak wouldn’t be until the 2020s.

Maybe you could pin down this oscillation a bit better if you could extend the record using proxies. But the only examples I’ve seen (e.g., Poore et al’s fig. 3) don’t give me much confidence in specific claims about the “period” of the AMO. It just seems to be all over the map.

Not trying to be cranky … sorry if it comes across that way.

28. 2011 January 29 at 10:45 am

Ned,

This year’s peak was the third highest:

1878 0.448
1998 0.394
2010 0.381

All were big El Nino years. 1878 was particularly bad.

If I were really gung ho, I’d do an fft power spectrum. I’m betting there’s a big peak in the spectrum near 60 years. I bet there are also smaller peaks near 11 and 22 years. Obviously it’s not proof. There is no proof in science, as opposed to math, there’s only falsification.

29. 2011 January 29 at 6:57 pm

Here’s one I did earlier – not good but gives the general impression.

The main peaks are 68 21 15 and 9 years
The lack of resoution at 68 means the peak could be anywhere between 55 an 68 years.

30. 2011 January 31 at 1:33 am

well, this post puts the AMO discussion here in a different context.

http://tamino.wordpress.com/2011/01/30/amo/#more-3425

I did not know that AMO-index was defined by temperature anomaly. Therefore, claiming that AMO is responsible for global warming has a large risk of circular reasoning

31. 2011 January 31 at 7:50 am

Is it coincidence that Tamino keeps answering questions that I’m asking? 😆

I first ran into the AMO as the detrended anomaly on wiki when I was looking harder at Hank Robertson’s (?)* David Benson’s CO2+AMO model.

* edit to correct author; here is the link
http://www.realclimate.org/index.php/archives/2010/10/unforced-variations-3-2/comment-page-5/#comment-189329

32. 2011 January 31 at 8:00 am

I don’t know, maybe he is following your blog closely 🙂
Before Tamino’s post I was not aware of the definition of AMO, but knowing that definition it is clear you have to be extremely careful when studying AMO and global temperature and causation between the two.

33. 2011 January 31 at 11:28 am

Obviously some physical phenomenon causes the AMO index to behave the way it does. The AMO index is supposed to reflect variations in the actual Atlantic Meridional Overturning circulation. That circulation is the cause of the temperature anomalies used to calculate the index, not the other way around.

The NH has more land area than the SH. That makes the average heat capacity of the NH lower than the SH. The temperature range of the annual cycle in the NH is then larger than the SH. When you add the two together, you don’t get zero, you get a cycle that looks like a smaller version of the NH temperature with a small phase shift. If you then vary the relative amount of heat transferred to the NH vs the SH by ocean circulation, then the NH will warm and cool more than the SH. The result will be a cycle in the global average temperature even though global heat content remains the same.

34. 2011 February 1 at 1:16 am

The AMO index is supposed to reflect variations in the actual Atlantic Meridional Overturning circulation.

Well, that’s the whole point is it? When you only subtract a linear function that should account for GHG increase and then define that all other variation is due to AMO then you might overlook other global effects such as aerosol cooling during the 40s. You would define those temperature variations to be part of the AMO and in that way get a 60 year oscillation. Unfortunately, due to the somewhat circular definition, there is no way of finding out wheher all these temperature variations indeed belong to the AMO

35. 2011 February 1 at 8:57 am

There is indeed no way from analysis of past data. That’s not the point. What’s going to happen over the next decade is what’s relevant. Will the AMO index decrease and even go negative? Will, as a result, global average temperature remain below the model trends? Will Arctic sea ice loss slow and Antarctic loss increase? If none of that happens, then the aerosol only explanation for mid-twentieth century cooling is strengthened and a high climate sensitivity is more likely. However, there’s still the problem of the relatively rapid warming in the early twentieth century. A solar only explanation of that, if Leif Svalgaard is correct about the lack of variation in TSI, has problems.

36. 2011 February 2 at 12:45 am

I agree for a large part with that analysis. However, even when the in the next decade AMO index and global temperature will be correlated, it will be extremely difficult to establish that AMO was the cause, because the AMO index will be influenced by global temperatures as well.

1. 2011 February 10 at 10:02 pm
2. 2012 February 16 at 6:46 pm
3. 2012 February 16 at 6:47 pm