Home > GHCN, GIStemp > GHCNv2 and GRUMP Rural and Urban Extents

GHCNv2 and GRUMP Rural and Urban Extents

2010 March 13

Introduction

The GHCN v2.temperature.inv metadata for the GHCN v2.mean temperature records includes a “Rural/Smalltown/Urban” flag. GISTEMP, until very recently, used this flag as part of its urbanization adjustments for non-US countries. This method has been deprecated. GISTEMP is now using satellite brightness as its indicator of urbanization. The GHCNv2 R/S/U flags indicate ‘association’ with a town (10-50K) or city (>50K). An Overview of the Global Historical Climatology Network Temperature Database, Peterson and Vose, 1997.

Population. Examining the station location on an ONC (Operational Navigation Charts) would determine whether the station was in a rural or urban area. If it was an urban area, the population of the city was determined from a variety of sources. We have three population classifications: rural, not associated with a town larger than 10 000 people; small town, located in a town with 10 000 to 50 000 inhabitants; and urban, a city of more than 50 000. In addition to this general classification, for small towns and cities, the approximate population is provided.

The GRUMP Rural/Urban Extents is a gridded dataset with 43200 columns and 16800 rows covering -56S to 84N with a grid size of 0.008333 degrees (1/2 minute). The data set consists of a merging of two sources of information: population and settlement extents. Population data is derived from a variety of sources, primarily national census. Settlement extent is derived from the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) for a seven month period in 1994/1995, from an ESRI Digital Chart of the World (DCW), and Tactical Pilotage Charts (TPC). This is described in Methodologies to Improve Global Population Estimates in Urban and Rural Areas, Pozzi, Balk, Yetman, Nelson, Deichmann, 2003. More detailed discussion is available in The Distribution of People and the Dimension of Place: Methodologies to Improve the Global Estimation of Urban Extents, Balk, Pozzi, Yetman, Deichmann, Nelson, 2005

a Population
Population data were gathered primarily from official statistical offices (census data) and secondarily from other web sources, such as Gazetteer (www.gazetteer.de) and CityPop (www.citypop.de), or from specific individual databases when official statistical databases were not available. Based on the data available and applying UN growth rates, we estimated population in 1990, 1995, and 2000. In some cases, the records for cities and town included latitude and longitude coordinates. For those where coordinates were not available, we matched the settlement name and administrative units with the National Imagery and Mapping Agency (NIMA) database of populated places (gnswww.nima.mil/geonames/GNS/index.jsp).
The resulting database constitutes what we will call “points”.

b Settlements extent
The physical extent of settlements has been derived both from raster and vector datasets, in
particular:
• Night-time lights, produced using time series data from the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) for the period 1 October 1994 to 30 April 1995, where the pixel values are measurements of the frequency with which lights were observed normalized by the total number of cloudfree observations. To delineate the physical extent of human settlements we used the World Stable Lights dataset (“cities” component).
• Digital Chart of the World (DCW)’s Populated Places: an ESRI product originally developed for the US Defense Mapping Agency (DMA) using DMA data and currently available at 1:1,000,000 scale (1993 version). The “populated places” coverage is available for most countries and contains depictions of the urbanized areas (built-up areas) of the world that are represented as polygons at 1:1,000,000 scale.
• Tactical Pilotage Charts (TPC): standard charts produced by the Australian Defense Imagery and Geospatial Organization, at a scale of 1:500,000, originally designed to provide an intermediate scale translation of cultural and terrain features for pilots/navigators flying at very low altitudes. Each chart contains information on cultural, drainage/hydrography, relief, distinctive vegetation, roads, sand ridges, power lines, and topographical features. Settlements are reported both as polygons and points. Polygons and points were digitized for a number of countries,

Method

The GRUMP data was retrieved as an ascii file from the GWP web site. A simple Perl script was used to loop through the station data in the GHCNv2 file, extract the latitude and longitude, and use them to locate the GRUMP undefined/rural/urban values (0,1,2) in the GRUMP ascii data file to determine rural/urban extents for each of the stations. A half-gridsize offset was applied to place the GRUMP grid points into the center of the each cell. In addition, similar scripts were used to parse through the GRUMP data and extract all the urban values for display.

The GRUMP urban extents in CONUS and some surrounding regions. are displayed against a black background to create a ‘faux’ brightness map. This is compared to the NOAA/DMSP brightness map for CONUS

GRUMP US

DMSP US

http://www.ngdc.noaa.gov/dmsp/pres/low_light_120701/html/page4.html

The GRUMP urban extents for the world are displayed against a black background to create a ‘faux’ brightness map. This is compared to the NOAA/DMSP brightness map for world.

GRUMP World

DMSP World
http://antwrp.gsfc.nasa.gov/apod/ap001127.html

In the GHCNv2 v2.temperature.inv file, there are 7280 stations. Of these, 1959 are marked “Urban”, 1409 marked as “Small Town”, 3912 marked as “Rural”, and 0 with no designation.

Calculating the GRUMP designators for the GHCN v2 stations, 3549 are marked “Urban” and 3249 are marked “Rural”. There are 482 undefined stations.

Comparing the GHCNv2 values with GRUMP extent we get a significant mismatch on the Rural/Urban designations.

GHCNv2 Urban, GRUMP Urban: 1673
GHCNv2 Urban, GRUMP Rural: 213
GHCNv2 Urban, GRUMP undef: 73

GHCNv2 Small, GRUMP Urban: 1032
GHCNv2 Small, GRUMP Rural: 324
GHCNv2 Small, GRUMP undef: 53

GHCNv2 Rural, GRUMP Urban: 844
GHCNv2 Rural, GRUMP Rural: 2712
GHCNv2 Rural, GRUMP undef: 356

In the following figure, the stations in which [GHCNv2=rural,GRUMP=urban] are marked red. The station in which [GHCNv2=urban,GRUMP=rural] are marked blue. Undefined stations are marked in green. All other GHCNv2 stations are marked yellow.GHCN Station location

Remarks

I’m surprised that the GRUMP data set is using 15 year old DMSP imagery. DMSP OLS night light images are available up to 2008.

Roughly 5% of the GHCNv2 stations are undefined in the GRUMP extent data set. Possible source for errors lie in the latitude/longitude listed in GHCNv2, missing data in the GRUMP data set, resolution errors in GRUMP dataset, and gridding offset errors in the data lookup routines. Many of the undefined stations are located near water features and are likely ‘lost’ due to erroneous location designations within the water feature.

Excluding undefineds, 11% of the GHCNv2 urban designations and 23% of the GHCNv2 rural designations are not confirmed in the GRUMP data set.

References

Balk, Pozzi, Yetman, Deichmann, Nelson, The Distribution of People and the Dimension of Place: Methodologies to Improve the Global Estimation of Urban Extents, 2005

Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IFPRI); The World Bank; and Centro Internacional de Agricultura Tropical (CIAT). 2004. Global Rural-Urban Mapping Project (GRUMP), Alpha Version: Urban Extents. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Available at http://sedac.ciesin.columbia.edu/gpw 2010 Mar 13

Peterson, Vose, An Overview of the Global Historical Climatology Network Temperature Database, 1997.

Pozzi, Balk, Yetman, Nelson, Deichmann, Methodologies to Improve Global Population Estimates in Urban and Rural Areas, 2003.

About these ads
  1. 2010 March 13 at 10:51 pm

    One thing about working on your own posts, it leaves less time to read other blogs. Just saw Spencer’s latest UHI. Nice stuff. Seems to be relying on GPW for global pop data.

  2. carrot eater
    2010 March 14 at 4:55 am

    I don’t know if it’s just my eye, but it looks as if GRUMP is identifying more African urban extents than the satellite alone.

  3. 2010 March 14 at 8:26 am

    I agree.

    More urban areas identified than by brightness alone – especially in the non-industrial world. And a filtering out of ‘thinly’ illuminated areas (North American highways). Some of this is due, again, to the use of ‘administrative zones’ to define the areas of interest. In other words, if I have this right, the polygons used to draw the urban areas are not defined by the satellite imagery – but by local national governments. The satellites and aerial imagery are used to help characterize a particular polygon as urban or rural – as is population density. Pop density is used directly where possible or modeled by algorithms which calculate a population density distribution based on the coarser population density provided by the government in that region for a lower resolution.

    And keep in mind that UHI effects may occur 20 miles (downwind) from an urban centers.

    Ultimately, the point of this exercise is to find locations most likely to be affected by UHI. Population density is just one marker of urbanization. Brightness is another. Urban extent is a third. Satellite land use categorization or $GDP/km^2 are others.

    There are dense urban populations in much of the non–industrialized world which are not as energy intensive as industrialized suburbans. Areas in which the native vegetation has been replaced but in which concrete buildings and tar roads are not as common. American cities in the Western United States often have more trees and ‘green grass’ than the surrounding arid countryside.

    Spencer’s UHI shows a negative response in some US population ranges. Real or Noise? If real, is it due to the fact that ‘suburbanization’ has a cooling effect? That the suburban vegetation is more cooling than the native landscape or the prior agriculture in that area? Just speculation.

  4. 2010 March 14 at 10:43 am

    Nice work.

    With a combination of the google earth stations tour and this data you could build a really
    cool application where somebody could tour the stations: inspect the GHCN data, look at a
    nightlights picture for that site, look at the population data.

    Any chance of posting a file for download with the Station id, lat/lon, pop data, etc?

  5. 2010 March 14 at 11:03 am

    In your mismatch files.. I assume this is a lat lon

    43.95, 141.63 ?

    And assuming thats 43.95 N,141.63E

    Then yes you have a location there near the water.

    There is a database that has a urban extent by the coastline. I think its used for
    impact assesments due to Global warming.

    Let me see if I can find it.

  6. 2010 March 14 at 11:05 am

    Just updated the station files linked above to include station ids.
    Also, a slightly improved version of the stations map. (bigger points)

    I’m planning on posting augmented versions of v2.temperature.inv and ushcn-v2-stations.txt. Not sure when yet.

  7. 2010 March 14 at 11:11 am

    GPW has a coastal urban extents file.

    But the ‘coastal’ issue is more about ‘lost’ stations.
    Three possible sources for errors:
    1) bad lat/long in GHCN file
    2) gridding issues on coastal boundaries in GPW/GRUMP
    3) bad grid lookup in my code

    #1 errors can be corrected by visual inspection of locations in google maps.

  8. 2010 March 14 at 1:15 pm

    Ok,

    That will be cool.

    In the GHCN.inv the last char is either A/B/C

    The readme.f doesnt say this is GHCN nightlights

    but I recall an old file that had both GHCN nightlights and GHCN brightness index ..

    Anyways, It will be nice when there is a metadata database. looking foward to it.

  9. 2010 March 14 at 1:23 pm

    Ok, run a screen and screen out GHCN that are
    CO or LA or A.

    Selecting GHCN = RURAL, COAST =no, LAKE=no, Airport = NO, should get you 1975
    rural stations that are not on the coast or near a lake or at an airport.

    (by decoding the land use string.)

    Then compare GHCN rural ( no coast, no lake, no airport) with GRUMP.

    Could give you some kind of idea about how much is gridding issues on the coast and around lakes and airports.

  10. 2010 March 14 at 2:07 pm

    If you apply the following screen

    GHCN = rural
    GHCN = No Coast, no lake, no airport
    GRUMP = rural.

    You end up with 1564 Stations.

    Since the GRUMP urban extent is based on 1995 nightlights and 1993 data
    I think you would want to insure that these places have remained Rural by using
    more up to date data, either population, nightlights or both.

  11. Zeke Hausfather
    2010 March 15 at 8:59 am

    Mosh,

    If you give me a list of station IDs, I’d be happy to run them through the model and see what temp trend they give, as well as compare them to other stations in the same grid cells ;)

  12. 2010 March 15 at 10:45 am

    Better than that Zeke!

    https://surfacestations.dabbledb.com

    You can just select a view ( Grump Rural, no water no airport )

    Go to the format options: select either xls or Csv and download the Inventory.

    If you like I can add you as a admin and ron too.

    basically if you create a File or a URL ( like ron did ) for a data file, Then it can be imported
    and merged, provided you use the full GHCN id.

    You need:

    GHCN identifier ( I read it in as text )
    New Field, New Field, new Field, etc etc

    I did my a bit different than yours I keep the full GHCN identifier country+WMO+mod
    and then I split it. you’ll see.

    Anyways, If you post your stuff ( GHCN identifer, data,data, data)
    The program will figure out the rest.

    Just finished it this morning.

  13. 2010 March 15 at 11:36 am
  14. 2010 March 15 at 11:40 am

    Ron,

    You can see as well that the vast majority of GRUMPs that are unidentified are
    either on the coast or by a lake.

    Only 15 GRUMP unidentified are Coast=No, Lake =no.

    http://surfacestations.dabbledb.com/publish/surfacestations/e8f59633-8856-42b4-a2f2-560ec3a82de0/grumpundefinednotcoastorlake.html

  15. 2010 March 16 at 8:47 am

    Mosh: here you go

    1900-2009 trends:
    All stations – 0.075
    Mosh stations – 0.072

    1960-2009 trends:
    All stations – 0.202
    Mosh stations – 0.232

  16. harrywr2
    2010 March 16 at 11:13 am

    Zeke Hausfather :
    Mosh: here you go
    http://i81.photobucket.com/albums/j237/hausfath/Picture189.png
    1900-2009 trends:
    All stations – 0.075
    Mosh stations – 0.072
    1960-2009 trends:
    All stations – 0.202
    Mosh stations – 0.232

    Unfortunately, the rural, no coast, no lake datafile has urban areas in it. Gerryville, Algeria AKA El Bayadh, Algeria was one.

    Air travel has gone from 100 billion passenger-kilometers in 1960 to 3 trillion passenger-kilometers in 2000 while global population went from 3 billion to 6 billion.

    A 30 fold increase in air travel. Where are the majority of the thermometers?

  17. 2010 March 16 at 7:48 pm

    Heh! dabbledb, huh.
    Blocked at work.
    kinda fun.

    Anybody know what the definition of the A/B/C categorization of brightness is?

  18. 2010 March 16 at 10:18 pm

    Thanks harry. I’m sure the screen I suggested is not perfect. The point is trying to use a screen to get a list of stations that can be scruntized in more detail. and then add to that list.

    The other point is this: we may be able to find a UHI signal if we compare the best to the worst. I would expect to. We may also find that this differnce, while real, is negligible. In everything I’ve looked at its clear that the signal is NOT huge. I see nothing wrong with getting down to the end and stating that UHI is within the noise floor. Going in I expect it to be bigger than Jones .05/century, but not more than say ,15.

  19. 2010 March 16 at 10:19 pm

    Na I looked for that a while back, maybe it come out of peterson 2003? I would ask them but my name is mud. fair enough.

  20. 2010 March 16 at 10:22 pm

    Cool thanks. At some point I want to add density to the metadata. Can you m\post a list of ghcnid and associated density. I’ll add it in.

  21. 2010 March 16 at 10:24 pm

    Ya it is kinda fun. quick and dirty. its cool too in that i can just click on a loaction and go to google
    …sometimes

  22. 2010 March 17 at 6:41 am

    Link from the previous post:

    http://rhinohide.org/rhinohide.cx/co2/ghcn/data/ghcn_station_inv_popdensity.txt

    But that comes with the big caveat that popdensities are known to be too low in countries that don’t have good resolutions.

    Also, I’m going to be redoing my method for data extraction. Been working with the java-netcdf code in order to extract data from the DMSP files – it should work well with the GPW stuff as well.

  23. 2010 March 17 at 8:30 am

    Mosh: slightly OT, but you should tell Anthony that rather silly articles like http://wattsupwiththat.com/2010/03/16/rewriting-the-decline/ really don’t help advance the debate… I don’t even want to try and guess what shape temp databases were in during the 70s before the WMO and NCDC decided to work on collecting and assembling records. Oddly enough, the worries over declining temps in the 70s might have provided some of the inspiration for the initial WMO efforts to collect station data (which were in the late 70s/early 80s iirc).

  24. carrot eater
    2010 March 17 at 9:09 am

    Look on page 1327 of this.

    http://ams.allenpress.com/archive/1520-0477/89/9/pdf/i1520-0477-89-9-1325.pdf

    Had nothing to do with adjustments; it was data cover.

    See here.

    See the difference between NH and SH? At the time, there wasn’t good coverage in the SH.

    Quality control is questionable at WUWT, as always.

  25. 2010 March 17 at 2:04 pm

    Zeke,

    I’ve been meaning to post on that article. After a couple paragraphs I just threw up my hands. I see no point in pulling up old graphs ( kinda like the Lamb diagram ) Perhaps I should go over and say something. I prefer to do my comments in public. On one occasion I have been asked [not by anthony] to review stuff [not by anthony] before it went up . my response was don’t publish. That advice was either not passed on or my vote wasnt enough. Since then I’ve been asked to review one other piece and I declined on principle. I’d rather make my comments public.

    WRT Anthony’s editorial choices. His principles are not mine. Just from a external perspective
    I think he seems quite happy to post many things that are on the edge and let his readers sort out the wheat from the chaff. That’s an interesting experience. I suppose you view of things is that WUWT is the national enquirer. Sometimes, however, the enquirer gets things right. This puts a huge burden on a reader like me. i’d like some of the chaff separated. but strangely, people are drawn to a site where there is an ACTIVE sorting of the wheat from the chaff. go figure. In the internet era people want to be a part of the editorial review. they dont trust the news unless they can talk back to it. maybe I’ll write about this phenomena.
    let’s just say that by old standards Realclimate “should” have more credibility than any other site.
    but it doesnt. why is that? well you can blame the readers and say they are ignorant, but I’m thinking it has more to do with this. Its the internet. people expect to be able to interact freely. they believe in the power of two way communication. Gosh I remember my first time on Slate
    ( I used to hang there with glenn reynolds) when the author of an article came down into the comments to argue with me. Damn. a conversation. I’m postulating that the internet changes our conception of how we come to truth or rather returning it to an experience rooted in dialog.
    As irrational as this seems I trust gavin and judith curry more than someone like jones mreley because they dialog with their opponents ( gavin sometimes posts to other blogs) I trust Rob Wilson more than Briffa and Mann not at all. Part of this of course is reading that Mann has a policy of ignoring. And of course I trust you and ron and the guys at CCC more than Tamino.
    Again, because you dialogue and share your work and TOOLS. those tools empower me.
    If you share your tools ( code) you are sharing your power. That builds huge trust.

    Finally, let me let you in on a little bit from behind the curtain in skeptic ville. ( luke warmers as well) On a few occasions I have made private suggestions to tall the notable “skeptics” things I would look at. result? ZERO.
    these damn people just do what the hell they are interested in. there is another funny thing. Once they get interested in a thing they dont let go. This is just an observation. I think if people in the warmist camp understood that some of there opponents have “hobby horses” they could deal with them more effectively. If you view them as part of a conspiracy, then they just secretly laugh to themselves, because you got them wrong. When you characterize them [ not you zeke, I'm not saying you do this] as part of an organized effort then they just have contempt for you because you get them entirely wrong. then they ride that hobby horse harder and longer.

  26. 2010 March 17 at 3:36 pm

    Thanks Ron.

    Your early version of the density data is now up at

    http://surfacestations.dabbledb.com/publish/surfacestations

    Some notes:

    For some stations you had a 9999 flag for density.
    there were also stations ( maybe islands? ) where you had no data. I flagged those as 99999

    I created a view for floks to look at or download: population density.
    I also took the list I asked Zeke to run ( no coast, no airport, ruralghcn, rural grump) and sorted
    it by density.

    Looks like some work is required. I will add in GISS nightlights, but the problem there is GISS drops stations from GHCN. so I’d rather put in the source data..

    However, until all the positions are fixed this is largely busy work.

    For uploading a CVS file works great, as I can just point to a CVS file as a url.
    but sometimes the tool thinks lines of data are just txt. like with a semicolon separator.

    I can also read in column headers.

    ghcnid,lat,lon,metadata
    xxxxx,xx,xx,x
    xxxxx,xx,xx,x

    Missing data is ok, but just a pain

  27. 2010 March 17 at 4:16 pm

    Mosh,

    Ron should have his own version of nightlight data up shortly. The only issue is that the current scale used (0 to 63 for brightness) doesn’t correspond to the one used in the initial readings from the satellites at the time that the dark/dim/bright designations were created :P

    I looked at the distribution and somewhat arbitrarily set 0-20 as Dark, 20-40 as Dim, and 40-63 as Bright, which seems to work okay, but we probably should try and figure out if there is a better way.

  28. 2010 March 18 at 12:41 am

    Ok,

    I read in NASA nightlights ( from v2 ) And then to my shock relaized that the .inv file from GISSTEMP is POST step one, which means some stations have been added ( antartica ) and
    I know that Gisstemp drops some strange stations.

    Funny, I was looking at the nasa scale and saw numbers out past 1 hundred and I swore that
    Imhoff 97 was scaled to 0-100. Anyways, it looks like they have different scales. The other
    thing I thought of was its best to preserve the raw data and not bin it. Like trying to track down that GHCN designation ( A/B/C) nightmare for follow on analysts.

    Anyways, lots of balls up in the air. jeffId has a nice post on anomalies and roman’s method

    this stuff is way more fun than tree rings

  29. 2010 March 18 at 12:44 am

    pop density added back in after system restore in case you guys were looking for it during my crash. Their restore points are far apart, so I have to do manual back ups from now on.

  30. 2010 March 18 at 2:32 pm

    For some reason I cant download the Nightlights source data. maybe they block me cause I’m not .edu or .gov?

    no worries. Cluttering up my business laptop with this stuff is getting unweildy, so maybe I can justify a new system!

  31. 2010 March 18 at 4:27 pm

    The files here?

    http://www.ngdc.noaa.gov/dmsp/downloadV4composites.html

    They are really big. About 300 mb compressed. Could well be a smart proxy server blocking you.

  32. 2010 March 18 at 4:31 pm

    GISS v2.inv includes …

    A/B/C with modifiers 1/2/3 (ie C_ or C1 are possible)

    … and …

    0-186

  33. 2010 March 18 at 10:48 pm

    Ya ron those files. No joy for me. i’ll try again

  34. 2010 March 19 at 2:43 pm

    works now.

  35. 2010 March 20 at 7:40 am

    J. Hansen, R. Ruedy, M. Sato, and K. Lo, 2010, Current GISS Global Surface Temperature Analysis (draft)

  36. harrywr2
    2010 March 20 at 9:03 am

    I like this quote from the paper
    “Temperature records in the United States are especially prone to uncertainty”

    Population by continent…Africa went from 366 million in 1970 to over 1 billion now.
    Asia wen from 2 billion to 4 billion. US population went from 200 million to 300 million. Europe is pretty much flat.

    https://www.census.gov/compendia/statab/2010/tables/10s1294.xls

  37. 2010 March 21 at 11:27 am

    Thanks Ron. It was good reading.

    The great thing in my mind is that now people have some tools in their hands to do their own
    hypothesis testing. either with the CCC version of GISSTEMP, or Zeke’s stuff or CRU’s approach.
    JeffId is also working up a version that should be interesting.

    As people find that these methods give largely the same answer, questions about method
    will focus in on issues that are incompletely described ( in my mind)

    1. Issues with how GISS handles the arctic. ( Tilo had an interesting post on this )
    This is one of those “improvements” that doesnt change the big answer but just
    lookas more closely at the difference ( small) between CRU and GISSTEMP
    2. How averaging is done for grids that are over land and water. CRU and GISSTEMP
    do it differently. Again, an area for improvement without substantial change.
    3. Various approaches to spatial averaging. ( improvements that dont change the final answer)
    4. calculations of error due. to sampling ( spatial coverage ) Improvements that communicate better the level of certainty in the data.

    Next, are the issues of metadata:

    1. Having accurate locations for sites ( besides USHCN)
    2. A good set of alternative proxies for urban/rural studies.

    And then the issues of raw data and its adjustment:
    1. TOBS adjustments in the US and ROW
    2. ADjustments for instrument changes.
    3. microsite degredation. ( some interesting posts on Lucia’s site )

  38. 2010 March 21 at 10:39 pm

    Interesting. Good description of the benefits and limitations of each approach. I bet we will see some new pop density databases soon as high-res satellite imaging and better object-identification algorithms are developed.

    Heck, back in 2005 when that report was written Google Earth had just come out :-p

  1. 2010 March 18 at 10:47 am
  2. 2010 May 11 at 8:08 pm
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 27 other followers