Home > GHCN > GHCN metadata: Horseshoes and Hand Grenades?

GHCN metadata: Horseshoes and Hand Grenades?

2010 March 10

“Trust but verify”

When it came to the GHCN station metadata, I’ve been harboring suspicions about how accurate much of the metadata was. I’d also seen comments questioning station latitudes, longitudes and altitudes. A primary reason that I’ve been interested in the GPWv3 and GRUMP datasets is to provide an alternate source for urban/rural determination. I’m also looking for original bright/dark data so that I can contrast it with the field in the GISS v2.inv file. Note that I’m not implying deceit or fraud on the part of any of the parties involved in the chain of data ownership – mostly just institutional inertia and cruft.

That said, lets take a quick peek at Algeria, the first country by country code in the GHCN dataset, while reviewing a Peterson and Vose’s 1997 description of the dataset …

Algeria has a surprisingly long list of stations in the dataset, coming in at 50. By comparison, Angola has 17.

Original Sources for Stations

Peterson and Vose, 1997, An Overview of the Global Historical Climatology Network Temperature Database identifies 31 different sources for the stations comprising the GHCN dataset. These datasets varied from the very large (3563 stations) to the very small (sets of 1 each for Qatar, Kuwait, and Ireland). I hope to look closer at these in a later post. The following quotes (except for the DoD ONC) description come from the 1997 paper.

Original Sources for the Metadata

Each station in GHCN was located on Operational Navigation Charts (ONC). With a scale of 1:1 000 000 (1 cm on the map covers 10 km on the earth), ONC were created by the U.S. Department of Defense. Available through the National Oceanic and Atmospheric Administration (NOAA), these charts are used by pilots all over the world. ONC have elevation contours, outlines of urban areas, locations of airports and towns, and for most of the world, a simple vegetation classification. We located every GHCN station on ONC to both quality control station locations and to derive five types of metadata.

From a DoD site:
(ignore the bit about combining with the wetlands data, I was just looking for a good overall description of ONC – surprisingly hard to come by – who deleted the wiki page?)

ONCs are aeronautical charts produced produced by the United States, Australia, Canada, and the United Kingdom for use as en route medium-altitude (2,000 to 25,000 feet) navigational aids. They show elevation, topography, cultural features, and hydrography. ONCs were integrated with two other independent digital sources–vegetation maps following the UNESCO system (Matthews, 1983) and FAO maps of soil properties (Zobler, 1986)–to produce a high-resolution global database for evaluating natural wetlands and their methane emissions. The ONCs provided fractional inundation data from 1-degree cells of a global map survey and were the most up-to-date and consistent of the three sources. Aerial photography is the fundamental mapping tool, and the large scale of the series provides the potential for representing more realistic detail than do most of the smaller-scale sources used in compiling the other two databases.



Population. Examining the station location on an ONC (Operational Navigation Charts) would determine whether the station was in a rural or urban area. If it was an urban area, the population of the city was determined from a variety of sources. We have three population classifications: rural, not associated with a town larger than 10 000 people; small town, located in a town with 10 000 to 50 000 inhabitants; and urban, a city of more than 50 000. In addition to this general classification, for small towns and cities, the approximate population is provided.

What is not clear is how close a station had to be to a town or city to be classified as “Small town” or “Urban.” Peterson 2003 uses the vague term “associated.” Population data came from a wide range of sources not clearly identified. This sounds to me like a manual review with subjective criteria as to how close a station had to be to a town or city to be tagged ‘S’ or ‘U’.

Wherever possible, we used population data from the then-current United Nations Demographic Yearbook (United Nations 1993). Unfortunately, only cities of 100 000 or more inhabitants were listed in the yearbook. For smaller cities we used population data from several recent atlases. Again, although the atlases were recent, we do not know the date of source of the data that went into creating the atlases.

Station id: 10160531001
Station country: ALGERIA
Station latitude: 34.80
Station longitude: -1.30
Station altitude: -999
Station rural/urban: U
Station airport flag: x
Station vegetation: WARM CROPS
GWPv3 population density: na


Airport locations. Airports are, of course, clearly marked on ONC charts. If a station is located at an airport, this information along with the distance from its associated city or small town (if present) are included as part of GHCN metadata.

But not all airports are the same …

Station id: 10160559000
Station name: EL OUED
Station country: ALGERIA
Station latitude: 33.50
Station longitude: 6.12
Station altitude: 63
Station rural/urban: U
Station airport flag: A
Station vegetation: HOT DESERT
GWPv3 population density: na


Topography. ONC make detailed topography available to pilots. We used this information to classify the topography around the station as flat, hilly, or mountainous. Additionally we differentiated between mountain valley stations and the few mountaintop stations that can provide unique insights into the climate of their regions.

IN AMENAS has been classified as Hilly

Coastal Locations

Coastal locations. Oceanic influence on climate can be significant, so these metadata include (a) if the station is located on an island of less than 100 km2 or less than 10 km in width at the station location, (b) if the station is located within 30 km of the coast it is labeled as coastal and the distance to the coast is provided, and (c) if the station is adjacent to a large (greater than 25 km2) lake, that too is noted because it can have an influence on a station’s climate.

Oceanic influence is more significant on some coastal station than on others …

Station id: 10160355000
Station name: SKIKDA
Station country: ALGERIA
Station latitude: 36.93
Station longitude: 6.95
Station altitude: 7
Station rural/urban: U
Station airport flag: x
Station vegetation: WARM DECIDUOUS
GWPv3 population density: na


The vegetative data comes from two sources: ONC and Olson 1983.

Vegetation. If the station is rural, the vegetation for that location is documented. The classifications used on the ONC are forested, clear or open, marsh, ice, and desert. Not all ONC had complete vegetation data, so these metadata are not available for all stations. An additional source of vegetation data is included in GHCN metadata: the vegetation listed at the nearest grid point to each station in a 0.5° ´ 0.5° gridded vegetation dataset (Olson et al. 1983). This vegetation database creates a global vegetation map of 44 different land ecosystem complexes comprising seven broad groups. These metadata do not indicate the exact vegetation type at the station location, but they do provide useful information. In particular, an ecosystem classification can be used to some degree as a surrogate for climate regions since vegetation classes depend, to a large extent, on climate.

Station id: 10160620000
Station name: ADRAR
Station country: ALGERIA
Station latitude: 27.88
Station longitude: -0.28
Station altitude: 263
Station rural/urban: S
Station airport flag: x
Station vegetation: COOL FOR./FIELD
GWPv3 population density: na

Full pages for Algeria and Angola

Posting these on WordPress is not ideal. I have to make manual edits when I add, update, or correct my parsing scripts. But for now, you can take a look at what I’m looking at here:

http://rhinohide.wordpress.com/ghcn-station-map/101-station/ (Algeria)
http://rhinohide.wordpress.com/ghcn-station-map/102-station/ (Angola)


I’m not sure who all of the consumers of GHCN metadata are, but I hope they are doing some QC of their own. The biggest issue I have to date is that it does me no good to use population density data gridded to 2.5′ if the stations have a lat/lon error of 5′. And if my suggestion that much of the metadata was constructed using manual inspection of the various ONC maps is correct, then it is probably past time to replace that with an automated method. Not that I’m the first to suggest it, and I’m pretty sure the NOAA wonks are working it – maybe someone can give me a few hints of what’s coming next in GHCNv3.

See also Peter O’Neill http://oneillp.wordpress.com/

  1. carrot eater
    2010 March 10 at 4:20 am | #1

    “The biggest issue I have to date is that it does me no good to use population density data gridded to 2.5′ if the stations have a lat/lon error of 5′.”

    Yup. That’s what I was worried about – “How accurate and precise are the coordinates in the GHCN?”

  2. Peter O’Neill
    2010 March 10 at 9:54 am | #2

    The problem is, as they say, even worse than we thought. One of the consumers of GHCN metadata is of course Gistemp, and the implications of imprecise lat/lon for Gistemp are now considerably greater, following the change last month to use of satellite-observed nightlight radiance to classify stations as rural or urban throughout the world rather than just in the contiguous United States as was the case previously. As about a quarter of all GHCN stations changed classification as a result, this is certainly not a minor change. But how can you judge nightlight radiance of stations which are not where you believe them to be?

    I notified Gistemp two weeks ago of problems even greater than those you have outlined above, but have not as yet received any reply (surprisingly, as I have previously received responses within one working day when passing on code errors or other comments). In view of this I also emailed Russell Vose at NOAA directly late last week, rather than assuming he would be contacted by Goddard. He has replied, appreciating feedback, is passing on the information to those working on a new version of the temperature dataset, and indicates that “Hopefully some of these can be fixed quickly, but others may take a little longer”.

    I had intended to wait until the next Gistemp update appears to allow them time to consider how to handle this, even without the courtesy of a reply, but as this is now out in the open, here are some further examples.

    Consider the following Google Earth image:

    The yellow pushpin on the right shows the location of the previously urban Spanish station at Albacete, as found in v2.inv – certainly a location which could be expected to be dark at night. The two pushpins on the left show stages in finding the true location, an interesting story, but one I will continue in a following comment, as I would first like to see how WordPress handles my attempt to post the Google Earth image above.

  3. carrot eater
    2010 March 10 at 11:33 am | #3

    There was an article called “Using Google Earth to evaluate GCOS weather station sites” in 2008, author Ian Strangeways, where this issue was mentioned. He was trying to use google earth to find the stations, and found the coords weren’t good enough.

  4. Peter O’Neill
    2010 March 10 at 11:34 am | #4

    Apologies if this gets posted twice. The previous attempt seems to have failed.

    To continue that interesting story now (the WordPress preview by the way seems to crop the right hand side of the image, so that the right hand pushpin is almost lost, but the image link works correctly, and the pushpin is fully visible there.

    The v2.inv (Gistemp version with radiance) for Albacete is:

    64308373001 ALBACETE SPAIN 39.00 1.80 43 0U 83FLxxno-9x-9WATER A 0

    and at first I thought that the problem was simply an east/west error, and that the longitude for Albacete should be -1.80, not +1.80, and this did relocate the station closer to Albacete, as the right hand pushpin in this Google Earth image shows:


    but something still seemed wrong. Then I looked at another Spanish station, Soria:

    64308171001 SORIA SPAIN 41.70 1.20 1068 424R -9HIFOno-9x-9MED. GRAZING A 14


    which showed a similar eastward shift, but one which could not be explained by a change of sign. Then I recalled that the maps I had used many years ago walking in the Picos de Europa in Spain had shown longitude relative to Madrid : 3.6879 degrees west of Greenwich, and all became clear. The v2.inv longitudes for these two stations were relative to Madrid, not Greenwich, and located on mapping other than ONC!

    These however were the only two of the Spanish stations referenced relative to Madrid, the remaining stations stations seem to have been referenced relative to Greenwich. But there were still a couple of trivial (though perhaps not so trivial politically) errors among these “Spanish” stations.

    64308160000 TENKODOGO 11.77 0.38 -999 288U 449HIxxno-9A10WARM GRASS/SHRUBA 0


    64308330001 SINTRA/GRANJA 38.80 -9.30 133 196U 1100HIxxCO 5A10COASTAL EDGES C 32

    are located in Burkina Faso and Portugal respectively. Portugal already has a genuine historic border dispute with Spain (Olivença), without adding an imaginary one.

    I located three major errors in about ten minutes of starting to look for errors, relocations so distant that no knowledge of the true coordinates was needed to find them. I had already noticed some time ago that the two stations at Cherbourg and Cherbourg airport, an area I am familiar with, had been relocated out into La Manche/English Channel, albeit not as extremely as these three stations, and it was in fact this which prompted me to check other stations now, and suggested an easy way to check at least some stations – I simply imported the coordinates of all European stations into Microsoft AutoRoute, and looked at the relatively few pushpins which appeared to be out at sea, and checked these to see if there was a “supporting island”. Albacete, Limassol and Isola Gorgona were obviously wrong. There were also some others closer to shore, and in areas such as Turkey where the map detail was insufficient.

    61017600001 LIMASSOL CYPRUS 34.70 32.00 8 0U 82HIxxCO 1x-9WATER A 0
    62316197001 ISOLA GORGONA 42.40 9.90 254 0R -9HIxxCO 1x-9WATER A 0


    and of course these may be repeated elsewhere – I could only examine European waters in this way. Considering Albacete led me to find Soria, and then to look at the stations located at airports, a group of stations where the general location of the station could be inferred from the airport boundaries, and found that a considerable number of the coordinates in v2.inv were located at a considerable distance from the airport boundary, where I have regarded “a considerable distance” as “by multiples of the length of the main runway”, and considerably more than the precision implied by coordinates given as degrees to two decimal places. I generated KML files setting up tours of these stations in Google Earth, with pushpins and identifying information at each station location. Among examples of dubious coordinates from Algeria are:


    and there are many more countries in the metadata file. For anyone interested in examining these stations themselves, I will shortly clean up, comment and then post my R script to generate a KML tour.

    edit by rb: just a note that my # of links spam meter is currently set at six

  5. 2010 March 10 at 11:43 am | #5

    At least NOAA is consistent about Skikda’s location – although they recognized the location as a (former according to wiki) airport.
    (DABP) 36-56N 006-57E 7M

    A-Z world airports is even worse. This is Annaba (DABB)

    Wiki gets it right (I’m assuming Skikda metar comes from DABP)
    36°51′43.66″N 006°57′03.14″E

  6. Peter O’Neill
    2010 March 10 at 11:50 am | #6

    The airport locations shown in Google Earth are certainly often poor, but the runways shown seem to match those (some European only) I have compared with other mapping. I had wondered whether the coordinate systems were consistent WGS84, but even differing coordinate systems should not give displacements such as those I have seen (assuming consistent prime meridian at least)

    I have already posted my continuation, although it has not yet appeared. I will check in a few hours in case this is not a moderation delay, but instead a broken html tag used.

  7. 2010 March 10 at 12:00 pm | #7

    Link to “Using Google Earth to evaluate GCOS weather station sites”

  8. 2010 March 10 at 12:54 pm | #8

    Major Kudos for looking at this Ron.

    Like I said on Lucia’s site it would be very cool to do this as a community project,
    kinda like surface stations where the final output would be a verified set
    of metadata, publically available for each station in GHCN.

  9. 2010 March 10 at 12:55 pm | #9

    If the irish station is the observatory there is a rich history on it.

  10. 2010 March 10 at 12:59 pm | #10

    For USHCN they ended up being pretty good, but to tell you had to visit all the sites. for ROW?
    who knows.

  11. 2010 March 10 at 1:18 pm | #11

    Ugh, I was hoping that the lat/lon coords were reasonably accurate. Oh well, we might have to limit our site characteristics analysis to USHCN for a bit, and hopefully they will correct some of the more egregious divergences in GHCN v3…

    Someone tell NOAA to hire some interns to spend a summer Google Earthing the whole GHCN networks :P

  12. 2010 March 10 at 1:33 pm | #12

    I guess the real question is: what is the mean and st dev for the lat/lon error of GHCN stations? We could probably adjust the resolution of the pop density and nightlight data to compensate, at the loss of precision. Just go conservative and mark anything close to dense/bright as urban.

  13. carrot eater
    2010 March 10 at 1:48 pm | #13

    That decision is going to be made in the end, no matter what. On which side shall you err? Err on marking too many things as urban. And then Eschenbach will flip out, and tell you that all your results have to be thrown out.

  14. Peter O’Neill
    2010 March 10 at 5:23 pm | #14

    Just to clarify, it is the airport location SYMBOLS which are often poorly located in Google Earth. And the reference to “consistent prime meridian” should now be clear as the remainder of my earlier posting has now appeared.

  15. Peter O’Neill
    2010 March 10 at 6:52 pm | #15

    Trying once again to remove the remaining tag misinterpretations. (Please snip the earlier code when you get the chance)

    # Comment out next three lines after file download, save local copy
    url <- "ftp://data.giss.nasa.gov/pub/gistemp/GISS_Obs_analysis/GISTEMP_sources/STEP1/input_files/v2.inv"
    fred <- readLines(url)
    write(fred, "d://v2.inv")
    # Replace path and/or file name above and below as appropriate
    # ---------------------------------------------------------------------
    outfile <- "d://GoogleTour.kml"
    url <- "d://v2.inv"
    usePreviousLightsForUS <- TRUE
    tourName <- "Airports in file order"
    # ---------------------------------------------------------------------
    #10160355000 SKIKDA                          36.93    6.95    7   18U  107HIxxCO 1x-9WARM DECIDUOUS  C   49
    #10160360000 ANNABA                          36.83    7.82    4   33U  256FLxxCO 1A 7WARM CROPS      C   12
    write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", outfile)
    write("<kml xmlns=\"http://www.opengis.net/kml/2.2\"", outfile, append=TRUE)
    write("  xmlns:gx=\"http://www.google.com/kml/ext/2.2\">", outfile, append=TRUE)
    write("<gx:Tour>", outfile, append=TRUE)
    write(paste("<name>", tourName, "</name>", sep=""), outfile, append=TRUE)
    write("<gx:Playlist>", outfile, append=TRUE)
    nRU <- 0
    nUR <- 0
    lats <- c()
    lons <- c()
    nmes <- c()
    for (i in 1:length(fred)) {
      if (substr(fred[i],82,82) == "A") { # select airports only
        mark <- ""
        idx <- as.integer(substr(fred[i],104,106))
        if (substr(fred[i],68,68) == "R") { # rural
          if (usePreviousLightsForUS) {
            if (substr(fred[i],102,102) == " ") {
              if (idx > 10) { mark <- "?"; nRU <- nRU + 1 }
            } else {
              if (substr(fred[i],102,102) == "1") {
                if (idx > 10) { mark <- "?"; nRU <- nRU + 1 }
              } else {
                if (idx < 11) { mark <- "*"; nUR <- nUR + 1 }
          } else {
            if (idx > 10) { mark <- "?"; nRU <- nRU + 1 }
        } else { # urban or periurban
          if (usePreviousLightsForUS) {
            if (substr(fred[i],102,102) == " ") {
              if (idx < 11) { mark <- "*"; nUR <- nUR + 1 }
            } else {
              if (substr(fred[i],102,102) == "1") {
                if (idx > 10) { mark <- "?"; nRU <- nRU + 1 }
              } else {
                if (idx < 11) { mark <- "*"; nUR <- nUR + 1 }
          } else {
            if (idx < 11) { mark <- "*"; nUR <- nUR + 1 }
        lat <- gsub(" ","",substr(fred[i],44,49))
        lon <- gsub(" ","",substr(fred[i],51,57))
        o <- paste(substr(fred[i],82,82), substr(fred[i],68,68),
          gsub(" ","",substr(fred[i],101,102)), "(",
          lat, ",", lon, ")",
          gsub(" ","",substr(fred[i],104,106)), mark,
          "_",substr(fred[i],13,42), sep="")
        o <- sub("[ ]*$", "", o)
        o <- sub("[ ]{2,}", " ", o)
        lats <- c(lats, lat)
        lons <- c(lons, lon)
        nmes <- c(nmes, o)
        write("<gx:FlyTo><LookAt>", outfile, append=TRUE)
        write(paste("<longitude>", lon, "</longitude><latitude>", lat, "</latitude>",
          sep=""), outfile, append=TRUE)
          outfile, append=TRUE)
    cat(nRU, " Rural -> Urban changes", "\n");cat(nUR, " Urban -> Rural changes", "\n")
    write(paste("</gx:Playlist></gx:Tour><Folder><name>", tourName, "</name>", sep=""), outfile, append=TRUE)
    write("<Style id=\"pushpin\"><IconStyle><Icon><href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href></Icon></IconStyle></Style>",
      outfile, append=TRUE)
    for (i in 1:length(lats)) {
      write(paste("<Placemark id=\"pin", i, "\"><name>", nmes[i], "</name><styleUrl>pushpin</styleUrl>", sep=""), outfile, append=TRUE)
      write(paste("<Point><coordinates>", lons[i], ",", lats[i], ",0</coordinates></Point><Camera>", sep=""), outfile, append=TRUE)
      write(paste("<longitude>", lons[i], "</longitude><latitude>", lats[i], "</latitude>", sep=""), outfile, append=TRUE)
        outfile, append=TRUE)
    write("</Folder></kml>", outfile, append=TRUE)
  16. Peter O’Neill
    2010 March 10 at 7:04 pm | #16

    This code immediately above now verified: copied and pasted from post above, and run to give identical output.

  17. 2010 March 10 at 7:48 pm | #17

    Peter, thanks for the code post, and hopefully I didn’t whack something I should not have while doing the clean up. If so, you have my apologies.

  18. Peter O’Neill
    2010 March 11 at 3:04 am | #18

    Ron Broberg :
    Peter, thanks for the code post, and hopefully I didn’t whack something I should not have while doing the clean up. If so, you have my apologies.

    The surgery was slightly too radical. Replacing the comments:

    Long lines may wrap, but I hope are clear enough. As set up here, the current GISS v2.inv (i.e. with radiance values added) is downloaded and saved locally, then all the airport stations are extracted to form a Google Earth tour. The default time between features and wait at features set in Google Earth are used. Pausing the tour during the wait at a feature allows zooming in or out to examine surroundings, after which the tour may be resumed. The altitude value used, 12000, was chosen to allow runways to be easily visible if within the view window, but zooming out may be necessary to find runways where the v2.inv coordinate errors are larger.

    Each station has information prepended as it appears at the pushpin, as:

    ||| | |
    ||| | radiance
    ||| lat,lon
    ||GHCN brightness code: A/B/C, + 1/2/3 index for Canada, Mexico, US
    |urban/rural code: R/S/U
    airport code: x/A

    In addition, a flag, * or ?, is added where the station has changed from urban to rural or rural to urban (peri-urban being considered as urban for this purpose). A variable usePreviousLightsForUS is set TRUE to classify “US” stations before the change according to the previous Gistemp rules (155 Rural -> Urban changes, 437 Urban -> Rural changes for airport stations). Setting this to FALSE allows the previous classification to be determined for all stations according to the R/S/U codes for comparison (and would show 282 Rural -> Urban changes, 374 Urban -> Rural changes for airport stations). File names, a tour name, and the usePreviousLightsForUS value are all set at the beginning of the script.

  19. Peter O’Neill
    2010 March 11 at 3:08 am | #19

    Spacing the pushpin description to indicate the radiance value more clearly:

    |||  |          |
    |||  |          radiance
    |||  lat,lon
    ||GHCN brightness code: A/B/C, + 1/2/3 index for Canada, Mexico, US
    |urban/rural code: R/S/U
    airport code: x/A
  20. 2010 March 11 at 7:20 am | #20

    Peter, I didn’t touch the long post above, so if it’s broke, it’s because of limitations in the comments. If you want to send me a short write-up, including links to pics and the code, I’ll give you a page and we can pull you out of the comments. If you get a wordpress account, I can give you access to post your own work.

  21. harrywr2
    2010 March 11 at 9:21 am | #21

    I’ve been play

    I’ve spent a couple of hours playing around with the GPW data set using this tool.

    In much of the world administrative districts are the primary local government. Similar to the concept of a county in the US.

    The data resolution in the Gridded Population Database for most of the middle east appears to be population density by administrative district.

    The administrative district that covers Riyadh, Saudi Arabia is huge. Of course 99% of it is uninhabitable desert. If I use the Gridded data tool and mark out the administrative district that includes Riyadh and include 400,000 km2 of uninhabited desert then I get close to the population of the City of Riyadh. Same is true for Mosul,Iraq Damascus,Syria etc.

    So in many places in the world the gridded data merely reflects the population density of whatever level population records are being kept at.

    I also played around with the tool in Washington state and found a few people living in uninhabited Wilderness. We have population records at the county and city level. The data for the cities appears accurate. The data for the ‘unincorporated’ portions of the county appears homogenized.

    Simplified theory of what I believe the method used for washington state was

    Within a county we have incorporated(cities) and unincorporated areas. We know the total population of the county and the land area of the county. We also know the population of the cities and the land area of the cities.

    By subtracting the land area and population of the cities from the county total we can determine how many people are in the unincorporated parts of the county. We then do simple division to get an average population density for the unincorporated parts of the county.

    The result being the gridded data tool shows people living 20 miles from the nearest goat path in the Cascade Mountain Range.

    The same problem shows up for Ramstein AB in Germany. The administrative district includes the black forest, the US soldiers don’t show up in the census. Using population density figures from the gridded database Ramstein AB shows up as rural.

    If the administrative area includes both rural and urban area’s the data is homogenized. Yielding a false rural.

  22. 2010 March 11 at 11:43 am | #22

    By KML tour what do you mean? I think I know just want to be sure

  23. 2010 March 11 at 11:46 am | #23

    That would be a great bit of advice for the “cru” redo team as well. Although I think that NOAA should probably just hire some mapping/imagery/GIS/database people to create a decent
    verified set of metadata.

  24. Peter O’Neill
    2010 March 11 at 12:06 pm | #24

    KML is the scripting language for Google Earth. The generated KML file (which is plain text, an XML file which you can open and read) contains two sections. The first section is a set of latitude/longitude pairs to be “flown to” in sequence (the tour), the second section contains the placemarker data, lat/long, viewing altitude and orientation, label, to generate pushpins at the stopping points. By default, opening and running the file in Google Earth will “fly” between points, waiting 4 seconds at each point. The “flying time” is 10 seconds, and speeding this up too much requires preloading of mapping data, resource heavy, rather than loading on the fly.

  25. 2010 March 11 at 12:40 pm | #25

    Agreed. I’m wondering how many stations you get if you run a really tight screen:

    no airports, no urban , etc.

    The other thing to see then is how much of the earth fits that description?

    For example: you look at night lights and at 5 minutes resolution you have say XXK elements of nighlights that are dark. Say its 80% of all
    the land ( probably more)

    within that 80% lets say you have 1000 stations. Now, divide those stations into 2 piles. estimate the rate of temperature growth in one pile. The other pile should be close.. right? The idea being you have say 20K grids that are dark. You have stations in 1000 of these grids. can you estimate the trend in the unsampled grids from the sampled?
    So you hold out 500. predict the other 500.

    Make any sense?

    The other thing to help people get a sense of this is a graphic of how much of the land is actually “bright” dim and dark. With urban centers the effect on rainfall outside the city has been noted up to 40 nm away. So, you could set up a screen that says the following:

    Location = dark.
    closest bright pixel is more than 40nm away.
    not sure if that’s feasible.

  26. 2010 March 11 at 11:48 pm | #26

    I’ve been looking at the GPW some more the last couple of nights.

    One problem with using popden as a proxy for urban in tight resolutions is that at 2.5′, business districts and airports can come up low pop densities. A lower resolution, like 15′ or 30′, can provide a better indicator when it comes to areal popden.

    And how wide of a circle do we want to draw around an urban center? Should the urban/rural border be drawn right where the concrete stops? Within 10 miles of the urban area? 20? Downwind? Don’t mean to overcomplicate this, but its at least a little complicated.

  27. harrywr2
    2010 March 12 at 9:33 am | #27


    I think the grid cells for the GPW are frequently smaller then the input resolution. As far as I can tell it’s all pretty much average population density within the boundaries of whatever government entity has population records.

    In Saudi Arabia,Oman,Iraq,Syria and Turkey that is by ‘administrative district’. So one ends up with a population density of 12/km2 in downtown Riyadh, since the administrative district contains 400,000 km2 of uninhabited desert.

    For those places that have population records at the municipal level, then it’s average population density within the borders of the municipality and outside the municipalities its average population density for the county minus the municipalities.

    In Washington State if I try to find the population of Winthrop, Washington which is ‘unincorporated’, I get the population density of rural Chelan County when I know for a fact that Winthrop is a small town with more then 7 residents. But since it is ‘unincorporated’ there are no formal boundaries and no formal population records.

    If I eyeball Puerto Princesa City, Phillipines(in the data file as Puerto Prince) it is clearly urban, but the population density appeasr to be the average for the entire Island of Palawan.

    I don’t believe one can answer the question ‘whats the population density in the vicinity of the weather station’ to any degree of accuracy using the GPW. It just doesn’t have the necessary input resolution. It’s probably reasonable to say it won’t yield a false urban but not reasonable to say it won’t yield a false rural.

  28. carrot eater
    2010 March 12 at 11:09 am | #28

    If the thing is indeed based on administrative district, that makes it really problematic for the reasons illustrated by harry. This can be checked by picking a huge admin district, and seeing if the popden changes as you change resolution, within that district.

  29. 2010 March 12 at 12:36 pm | #29

    I’ve been paying close attention to your comments and I’m coming to much the same conclusion.

    As a proxy for urban – it is not sufficient on a standalone basis due to a false sense of what the accurate resolution is. It might be pretty accurate in most of the US and Europe, but they complain of not being able to afford purchase of the best data in AU and NZ, and much of the rest of the world does not have accurate tight resolution.

    But can I get a coarse answer to “Urban or Rural” – the answer might depend on where and ‘how coarse.’ What I’m groping for in the dark is a tool to examine the R/S/U designation in GHCN. GPW might still be good enough to use for that determination – or good enough with a supplementary method. Or maybe not.

  30. carrot eater
    2010 March 12 at 1:58 pm | #30

    Have you contacted the people who provide this product? I’m sure they’d be pleased that somebody’s using it for something, so maybe they could give some guidance.

    In this case, it could be good for Zeke to update his UHI post at Lucia’s with a note, briefly describing the issue.

  31. 2010 March 12 at 3:10 pm | #31

    20 to 40nm downwind. wikipedia! rference for the effect on rainfall. FWIW.

  32. Peter O’Neill
    2010 March 12 at 7:58 pm | #32

    The image above (if these tags work correctly!):
    Google Earth image:

  33. Peter O’Neill
    2010 March 12 at 8:01 pm | #33

    But unfortunately WordPress changes the html img tag to a link.

  34. 2010 March 12 at 8:38 pm | #34

    Yeah, it appears that the pop density data is somewhat less useful for the rest of the world. I’ll add an update to the UHI post when I have a chance (only my work IP address has posting rights over there at the moment).

    That said, the U.S. data appears pretty solid, so we can start fiddling around with USHCN data. The U.S. is somewhat of an odd case relative to most of the world since little of the urbanization is recent. The metadata quality is better, however, and nearly all of the stations have long continuous records.

    As a fun example of confounding factors, most of the difference in trend between 100 pop density stations appears to disappear when you apply TOBs corrections.

  35. 2010 March 12 at 8:39 pm | #35

    Arg, the less than and greater than signs strike again. That was supposed to be:

    “As a fun example of confounding factors, most of the difference in trend between less than 10 pop density stations and greater than 100 pop density stations appears to disappear when you apply TOBs corrections.”

  36. carrot eater
    2010 March 13 at 6:19 am | #36

    I can’t say I expected TOB to correlate with that station classification.

    Have you seen Peterson’s 2003 paper on UHI? I think you mentioned it in passing once,so maybe. In that one, they go through and eliminate various factors like that. But if I recall, that was a static analysis like Spencer, not trend like yours. Maybe the memory is fuzzy though.

  37. 2010 March 13 at 11:50 am | #37

    CE Peterson 2003. I have some problems with that paper. Let me schematize the argument.

    peterson looks for a difference between urban and Rural by doing a pairwise comparison
    ( nice method )

    At the very start of the paper he notes that according to theory there should be a difference.

    He notes a variety of studies that do in fact find a difference, with urban being warmer.

    He notes that rural stations ( wang) may not actually be rural or that they can be contaminated.

    He decides to use a version of nightlights to determine urbanity.

    He also notes that prior studies have not applied all the adjustments required.

    He applies a TOBS adjustment ( but not Karls as I recall) I recall him noting that it impacted
    rural stations more than urban, but please check me.

    He applies a Instrument adjustments. ( for MMTS and the H083, i think )

    He looks at the metadata and tries to eliminate microsite bias: he removes two stations
    that are on roof tops. thats the only thing he can check in the metadata

    His station list is NOT a ushcn station list. ( see Climate audit posts on this, only portion are ushcn)

    When his test saw no difference he concluded the following:

    We can POSTULATE ( his word) that there is no difference because urban stations are located
    in cool parks.

    This cool park hypothesis is subject to verification. Part of why I liked Anthony’s work
    was that the cool park thesis could actually be tested. ( kinda) that is Peterson actually argued
    ( read the paper) that the urban sites had to be well sited because the guidelines called for them to be well sited.

    When We dont see a UHI signal comparing rural to urban, then we have a mystery.
    We know that the urban environment has a artificial warming signal. We also know that
    this warming signal is not uniform and in fact that there can be cool pockets. They tend
    to be very small. We have some evidence that rural sites can also have bias.

    When we dont find a difference between urban and rural, that’s a mystery. peterson says
    this, jones says this, parker says this. They all rely on the cool park explanation.

    When urban-rural is not positive I think we have these things to look at.

    1. Are the urban sites FACTUALLY in cool parks. Thats a site survey or IR survey question.

    2. Are the rural sites FACTUALLY rural? That’s a site survey and a detailed check of other
    rural sites in the area. Is there a “spenser effect”

    3. Are the adjustments to the data done properly and what errors do those adjustments have.

    4. Does are test even have the power to find the effect size we expect?

    Oh, the always obligatory: this wont change climate science.

  38. carrot eater
    2010 March 13 at 2:13 pm | #38

    Peterson 2003 found that much of the difference between urban and rural went away after you accounted for elevation and TOB. I think they did use Karl for TOB; they just used some method to infer the TOB changes, instead of the station field notes. That described here; I’ve not read it yet.

    Latitude didn’t make much difference, but that’ll vary from study to study, depending on what your clusters look like.

    Spencer adjusted for elevation, and nothing else I think.

    I think the basic message is correct: if you’re going to compare absolute temps instead of trends, you absolutely have to correct for elevation, and also TOB and instrumentation if you at all can. The last one gets tricky. If you don’t correct for the latter two, then you have to consider whether they will confound your results.

    Then you also have proximity to lakes, ocean, rivers. Not sure how you account for that.

    My gut says that for an individual station, microscale things are going to obscure the mesoscale UHI. Peterson is maybe consistent with that. Citing Oke, they come up with “Of the three scales, the microscale and local-scale effects generally are larger than mesoscale effects.”

    ENSO:climate change :: microscale:UHI :: noise:signal

    All that said, there will be some urban stations where you really do see an exaggerated trend, compared to the neighbors, and that isn’t due to the urban station changing elevation, latitude or TOB.

    As for whether things are really urban or rural: they claim some level of consistency between the nightlights and census data. if you want to sit there and look up all the urban stations Peterson used in google earth, to see if they’re urban and maybe in parks or by the river or whatever, please do.

  39. harrywr2
    2010 March 13 at 3:09 pm | #39

    I’m not the one doing the work here..so just as a suggestion.

    I would start by determining the stations with the highest anomalies. Then use the Mark 8 eyeball data analyzer and see if those stations have common characteristics.

    Then I would do the same for the lowest anomalies.

    It might be interesting to do the same comparisons by latitude as global warming theory says the impact should be greater in the extreme latitudes. So something along the lines of what are the identifiable differences between the warmest thermometer about 50 degrees north and the coldest.

    If I wander over to NASA GISS and select 250km Grid and 2000-2008 compared to the 1950-1980 average I can see a pronounced hot spot in the vicinity of Dhahran.
    I assume the GISS data is adjusted, just don’t know how.
    I know two things about Dhahran, it went thru rapid urbanization in the last 30 or 40 years and the population data in the GPW is homogenized to rural for Dhahran.

    Another hotspot on the GISS map appears to Lusaka,Zambia. This city also has undergone rapid urbanization. Funny enough it’s next to a cold spot. The population data does appear to be gridded at the municipal level. So we have a hot and cold spot at the same latitude next to each other.

    Then I see a hot spot in Barrow, Alaska. I think the Alaska ‘oil boom’ started there in the mid 1970′s..We actually have a station history

    “Site continuity reasonable, but changes in summer exposure to ocean, and albedo changes due to growth of town, moves, snow clearance, dirt on snow, etc. could result in false long-term trends”

  40. 2010 March 13 at 3:58 pm | #40


    My only claim was this. Peterson expected to find a difference. When he found no difference he relied on Oke and siting guidelines and he
    POSTULATED that urban sites must be in cool parks.

    Since 2007 I haven’t been able to convince a single person ( except damn skeptics) that.
    1. peterson said this
    2. A survey of urban stations was called for to
    test this postulation.

    Anyways, I was hoping that surface stations would in the end have surveys of peterson stations, but alas, peterson didnt use USHCN exclusively. I’ve found some of his rural stations in the surfacestations project, but the hit rate was low.

    Anyways, if you agree that such an investigation would at least make sense, then I feel like I’ve accomplished something.

    I always viewed peterson paper as a nice beginning and an interesting method, but hardly the last word.

  41. 2010 March 13 at 4:04 pm | #41

    Yikes! thats just data mining. did you study dendrochronology? ( ok bad joke )
    If you want to take that approach then its advisible to divide your sample and hold out
    stations, otherwise you are going to be overconfident in your results or just plain spurious correlationville

  42. 2010 March 13 at 4:30 pm | #42

    The other issue with adjusting for altitude is that in an urban environment Its not entirely clear that an “average” lapse rate adjustment will be accurate.

    Anyways, really cool work here:


  43. carrot eater
    2010 March 13 at 5:25 pm | #43

    it isn’t unusual to use the wider coop network for stuff like this; i think those stations are even used in the ushcn homogenisation now.

    They postulated about the local siting; if somebody wants to follow up on that, this could be a paper. it just gets hard because it becomes difficult to impossible to explicitly quantify local siting influences.

    I like the closing of Peterson

    “Additionally, as a community, we need to update our understanding of urban heat islands, to realize that this
    phenomenon is more complex than widely believed by those not immersed in the field. We should not view all oddly warmer stations as indications of UHI. Some urban stations are indeed warmer than nearby rural stations but almost the same number are colder.”

  44. Alex Heyworth
    2010 March 13 at 7:53 pm | #44

    steven mosher :
    POSTULATED that urban sites must be in cool parks.

    This would make the temperature readings lower than readings elsewhere in the urban area, but why would it affect the delta T over time?

  45. 2010 March 15 at 12:15 pm | #45

    I have gathered together my comments above in a single posting at http://oneillp.wordpress.com/ (the only post at this blog), with images inline and now enhanced to show 2.7 km square boxes at each station, 2.7 km being the spatial resolution of the radiance data. The updated R script in this consolidated blog post will also produce counts of urban, peri-urban and rural stations used, as well as of urban/rural changes.

  46. 2010 March 15 at 10:47 pm | #46

    Looking good Peter.

  47. 2013 April 6 at 3:49 pm | #47

    Peter O’Neill :
    I have gathered together my comments above in a single posting at http://oneillp.wordpress.com/ (the only post at this blog), with images inline and now enhanced to show 2.7 km square boxes at each station, 2.7 km being the spatial resolution of the radiance data. The updated R script in this consolidated blog post will also produce counts of urban, peri-urban and rural stations used, as well as of urban/rural changes.

    Just a note to clarify dead links in some of my comments here (I have just received e-mail notification of a new comment on this rather old thread). http://poneill.ucd.ie is now dead, as I retired later in 2010. That single posting at my blog gathering together my posts here can be found at http://oneillp.wordpress.com/2010/03/13/ghcn-metadata/

Comments are closed.

Get every new post delivered to your Inbox.

Join 27 other followers