Home > Deconstructing Watts, GIStemp > GISTEMP: High Alt, High Lat, Rural

GISTEMP: High Alt, High Lat, Rural

2010 March 8


In the Summary for Policy Makers in Watts’ and D’Aleo’s
Surface Temperature Records: Policy Driven Deception
, there is the following claim:

5. There has been a severe bias towards removing higher-altitude, higher-latitude, and rural stations, leading to a further serious overstatement of warming.

Earlier, I examined the effect of removing high altitude, high latitude and rural stations by using the GHCN v2.mean data set and applying the published CRU station gridding and averaging code. Here the exercise is repeated using GISTEMP inputs and methods.


The method will be described in some detail in the hope that this will reveal some fresh light on a process that is widely misunderstood.

Input Files

Updated Files:

The following files were updated via retrieval from FTP and HTTP sites.

Antarctic Station Surface Data: stationpt.html
Antarctic Mean Monthly Surface Air Temperature Data: temperature.html
Antarctica AWS Data: awspt.html

Static Files:

The following files are not updated and are included in the GISTEMP source.

combine_pieces_helena.in: contains parameters for special handling of station 147619010 (St Helena)

t_hohenpeissenberg_200306.txt_as_received_July17_2003: contains an unusually long lived record for central Europe (begins in 1781)

Ts.discont.RS.alter.IN: contains parameters for special handling of station 425911650 (LIHUE, KAUAI)

Ts.strange.RSU.list.IN_full: contains information for the exclusion of 65 records from 63 stations

v2.inv: similar to GHCN v2.temperature.inv but includes brightness information and Antarctic stations

9641C_200907_F52.avg: USHCN average of fully-adjusted monthly mean maximum and minimum temperatures (with estimates for missing values) (This file is OBE but is the one that the public code is hard coded to use. You can get a copy at the Clear Climate Code Project http://code.google.com/p/ccc-gistemp/downloads/list ccc-gistemp-test-2009-12-28.tar.gz (45mb))


Step 0 takes the Antarctica files, parses them into a GHCN v2 format and merges them into the v2.mean file (v2.meanx). It then removes pre 1880 data (v2.meany). USHCN stations are parsed out of the GHCN data sets and replaced with USHCNv2 stations. (v2.meanz). Finally, the Hohenpeissenberg record is updated and the final file, v2.mean_comb, is placed into a directory for use in the next step.

It can be noted that the GISTEMP process to replace the US data has been changing over the last several years as described on the GISTEMP web page.


Step 1 takes the multiple records found for many individual station ids and merges them into a single record. This process is handeled differently for overlapping and discontinuous records. GHCN v2 files listed Ts.strange.RSU.list.IN are excluded. Input from Ts.discont.RSU.list.IN, and combine_pieces_helena.in is used in this processing. The output is a single file, Ts.txt, with one data table for each station which includes a header line with some meta data.


Step 2 is the homogenization step which includes the urban-rural adjustments as well as the ‘brightness’ adjustment. The file Ts.txt is parsed into 6 latitudinal zones and each is processed separately. Stations with less than 20 years of data are dropped. The result is six zonal files with individual station records before the urban adjustments, and six zonal files with individual station records after the urban adjustments.


Step 3 is the global gridding and averaging step, the output of which includes the GLB.Ts.GHCN.CL.PA.txt file which is used in the following discussion.

More information regarding this process can be found at the GISTEMP sources web page:

Establishing a Baseline

GISTEMP Step 0 was run using the latest data available at the FTP and HTTP sites listed above. The v2.mean_comb file was saved to a separate directory for use in the following steps. GISTEMP Steps 1-3 were run to completion. The output file of a global average of gridded anomalies for surface stations, GLB.Ts.GHCN.CL.PA.txt was then saved for later comparisons. The baseline output from this step is compared to the official NASA GISS site:

orig base

High Altitude Stations

To test the effect of removing high atitude stations, the v2.inv file was parsed for a list of stations which were located at more than 1000′. These stations were then removed from the v2.mean_comb file. GISTEMP Steps 1-3 were run to completion. The global average of gridded anomalies for surface stations at low altitude (less than 1000m) are compared to the baseline. (unknowns are left in the baseline)

List of station below 1000m (4708)
List of station above 1000m (803)

alt 1000

High Latitude Stations

Testing the effect of removing high latitude stations is more problematic. In a previous version of this test using the public CRU gridding and averaging methods, all stations above 60N and below 60S were removed from the baseline. GISTEMP fails when this method is attempted due to the use of zonal gridding. Instead, 90% of the files above 60N and below 60S were randomly chosen from the v2.inv file. These stations were then removed from the v2.mean_comb file. GISTEMP Steps 1-3 were run to completion. The global average of gridded anomalies for surface stations at low latitude (100% below 60deg and 10% above) are compared to the baseline.

List of station between 60S and 60N (6945)
List of station between 60-90S/N kept (41)
List of station between 60-90S/N discarded (378)

alt 60

Rural Stations

The Gridded Population of the World data version 3 (gpwv3) was used to identify stations located in regions with more than 100 people per square mile. This is a significantly lower threshold for ‘urban’ than is used by the US Census bureau which defines urban as 1000 people per square mile and an urban cluster as urban areas and surrounding regions with 500 people per square mile. All stations located in the regions above 100 people per square mile were then removed frome the v2.mean_comb file. GISTEMP Steps 1-3 were run to completion. The global average of gridded anomalies for surface stations at low population density (< 100 people per square mile) are compared to the baseline.

List of station below 100 people/sq.mi. (5576)
List of station above 100 people/sq.mi. (1788)

pop 100

Low Station Count

A corollary to the ‘dropping of high latitude, high altitude, and rural’ stations claim is the claim that the loss of stations in the 1990s has caused a spurious warming trend in the last two decades. A list of stations with data available in this decade was extracted from the v2.mean_comb file. The converse of this, a lit of stations with data only prior to the year 2000 was also prepared. The post 2000 stations were then pulled from the v2.mean_comb and saved into v2.mean_comb-post2000. The original v2.mean_comb file without any of the post-2000 stations is then retained as v2.mean_comb-pre2000. Each file is then used to feed GISTEMP Steps 1-3 and the output is compared.

List of stations available in 2000 or later. (1382)
List of stations with no 2000 or later data. (5982)

pre/post 2000


There is no evidence that removing high altitude, high latitude, or rural (as defined by population density) stations from the data input used by GISTEMP produces extra warming. Indeed, if any trend is discernible, it is that a loss of high latitude stations might introduce a slight cooling effect to the GISTEMP globally gridded mean anomaly trend. Neither is there any evidence for extra warming when comparing stations available after 2000 with those that are only available before 2000. However, the early 1990s drop did cause a bit of discontinuity in the difference for that time – probably worth a closer look.

Further Discussion

Zeke Hausfather @ The Blackboard (guest post)

Lucia @ The Blackboard (a spherical cow)

drj @ Clear Climate Code

Tamino @ Open Mind

Tim Lambert @ Deltoid

I would also like to acknowledge D. Kelly O’Day @ Climate Charts and Graphs from whose site I cribbed the foundation R code for creating these charts.

  1. 2010 March 8 at 7:40 am

    Great work!

    Your first chart, comparing your Linux GISTEMP to official GISTEMP is almost certainly due to the difference in selecting urban stations. Compare with my chart on the Clear Climate Code blog.

    Makes no different to the trends and the conclusions of course.

  2. carrot eater
    2010 March 8 at 7:46 am

    I’m impressed you tackled this using the original software. So it’s better to remove stations from the _comb intermediate file, instead of the original v2.mean?

    I think your input file list might be missing the ushcn file, _F52.avg.

    What is the reason for the slight differences between you and the official results before 1940? The publicly available code probably isn’t the most current, given the switch to using nightlights globally on the UHI. Could that be the reason?

  3. carrot eater
    2010 March 8 at 7:59 am

    I see David Jones answered my question, and my guess was correct.

  4. 2010 March 8 at 8:04 am

    Oops. Actually, in my first draft I had replaced the 9641C_200907_F52.avg with 9641C_201003_F52.avg and recompiled, but decided to revert to avoid over-complicating the changes.

    For my purposes, v2.mean_comb was the better target. Otherwise additional low lat (Antarctica) and high population (USHCN) would have been added back into my data after I had filtered out the ‘undesirable’ station type.

    Also, the python files in STEP1 are not robust in handling entries in the various ‘control’ files (Ts.strange.RSU.list.IN, Ts.discont.RSU.list.IN, and combine_pieces_helena.in). They fault if a station is listed in one of those files but is not also in the v2.mean_comb file. So I had to add a filter to check the entries in those files against my working version of v2.mean_comb (actually, I checked against a ‘working, filtered’ v2.inv which I used to create the ‘working,filtered’ v2.mean_comb)

    I should note that the code I’m running is using a 1000km gridding radius as opposed to the 1200 mark.

    I don’t know the reason for the pre-1940 diversion.

  5. carrot eater
    2010 March 8 at 8:12 am

    That might explain why EM Smith had so much trouble doing what you did – deleting stations and seeing what happened. I’m guessing (but don’t know) he was deleting things in v2.mean itself. Though from your description, it would only trip if you removed a station also listed in the strange file or Lihue or St Helena.

  6. 2010 March 8 at 8:36 am

    Thanks for this — especially discussing what the “steps” are.

  7. 2010 March 9 at 12:38 am

    In “Further Discussion” Zeke’s post should link to rankexploits, but instead it links to tamino.

  8. 2010 March 9 at 6:58 am

    Thanks for the correction.

  9. AMac
    2010 March 9 at 8:16 am

    Really clear explanation and graphical display of output. Thanks for sharing all this work.

  10. harrywr2
    2010 March 9 at 9:12 am

    If one looks at the rural data set, one finds at least 4 major cities in the rural data for Iraq. Basrah, Kirkuk, Mosul and Sulamaniya. They may have been rural in the 1930’s but they surely aren’t rural now.

  11. carrot eater
    2010 March 9 at 9:22 am

    Your urban/rural links point to the same file.

  12. harrywr2
    2010 March 9 at 9:46 am

    The Data for Saudi Arabia in the rural file is misplaced as well, Jeddah has a population of 5 million. Riyadh has a population of 4.8 million with a population density of 3,800/sq km.

    Carrot Eater, just change the – on the rural link to a +

  13. 2010 March 9 at 10:08 am

    Updated several c&p errors on data links. My apologies and thanks for the catch.

  14. 2010 March 9 at 10:09 am

    I’ll look at the pop data tonight.
    I suspect that at a wider grid resolution,
    you will see a more familiar pop density numbers.

    Meanwhile, here is a link to Google Maps for the Riyadh location in the v2.inv

    Might explain the low pop density!

  15. carrot eater
    2010 March 9 at 10:27 am

    I think you’ve already shown that the results can vary quite a bit as you change the resolution. Maybe the questions are, if the database finds a sparse spot in a dense neighborhood, do we trust it? And does that matter? How accurate and precise are the coordinates in the GHCN?

  16. 2010 March 9 at 10:34 am

    Yeah. I think Harry has identified a good example as to why you have to be careful about how we use popden as a proxy for for rural/urban. Too tight, and you identified airports as unpopulated. Too wide, and neighboring sugar cane fields get identified as urban (reference to a blog post on a wxstn in Australia) I bet the Riyadh airport shows up well as a ‘bright’ spot though.

    Zeke has a post up at Lucia that looks closer at popden-v-brightness:
    In search of the UHI signal

  17. carrot eater
    2010 March 9 at 11:57 am

    I think you’re hitting on the key.

    All methods would have false positives and false negatives. But would high-res popden generally have the same false readings as nightlights? If not, then using the two methods together, using either an ‘and’ or ‘or’ condition, could be sensible.

    The question then is, on what side do you want to usually err, when you do err? That would determine how you proceed.

    I would think you should err on the side of classing too many rural stations as urban, instead of the other way around. But turns out, when you do that, people still get mad.

  18. 2010 March 9 at 12:38 pm


    That Riyadh station is not designated as rural…

    22340438000 RIYADH 24.72 46.73 620 696U 1380FLxxno-9A 1WARM IRRIGATED C

    That’s a bright urban airport. It does have a pop density of only 12.26 though 😛

  19. 2010 March 10 at 3:26 am

    Jeddah: 21.50, -39.20
    2.5: 39.46702
    15: 39.46702
    30: 39.46702
    60: 39.46702
    wiki: 2,921/km2

    2.5: 12.56034
    15: 12.56034
    30: 12.56034
    60: 12.56034

    NYC: 40.75, 74.00
    2.5′: 11170.8
    wiki: 842.3/km2

    Problem with my parsing algorithm?
    Or problem with the data?

    It’s late. But this has got my attention

  20. harrywr2
    2010 March 10 at 11:32 am


    The Riyadh site is in Ron’s rural file.

    The ‘irrigated’ in the GHCN meta data is bit off. If you can afford to water your lawn at $1 for a gallon of water then it’s irrigated. The palace has a rather nice lawn. High End Hotel’s have nice strips of grass out front. The palm tree’s along major thoroughfares have drip hoses as well. By that definition Manhattan is ‘irrigated’. I also wouldn’t use ‘warm’ to describe the climate in Riyadh.

    22340438000 RIYADH 24.72 46.73 620 696U 1380FLxxno-9A 1WARM IRRIGATED C

    Accurately population data and accurate ‘meta data’ for many parts of the world is just plain difficult to come by.

    The UN 2002 population study for Iraq lists Tikrit having a population of 28,000. When US troops showed up there in 2003 it had a population closer to 200,000.
    The number of troops required to ‘maintain order’ is a function of population. The rest is history. Moral of the story, ‘trusting’ someone else’s data can have unfortunate consequences.

  21. harrywr2
    2010 March 10 at 6:18 pm

    Is this the dataset you are using?

    Is this the dataset Ron used for his density data?

    I went there and used their ‘tool

    Managed to get a population of 36,000 in 1,800 km2 – population density of 20/km2

    Something tells me the gridded dataset has some ‘flaws’.

  1. 2010 March 8 at 11:59 am
  2. 2010 May 24 at 9:40 pm
Comments are closed.