GHCNv2 and GRUMP Rural and Urban Extents
Introduction
The GHCN v2.temperature.inv metadata for the GHCN v2.mean temperature records includes a “Rural/Smalltown/Urban” flag. GISTEMP, until very recently, used this flag as part of its urbanization adjustments for non-US countries. This method has been deprecated. GISTEMP is now using satellite brightness as its indicator of urbanization. The GHCNv2 R/S/U flags indicate ‘association’ with a town (10-50K) or city (>50K). An Overview of the Global Historical Climatology Network Temperature Database, Peterson and Vose, 1997.
Population. Examining the station location on an ONC (Operational Navigation Charts) would determine whether the station was in a rural or urban area. If it was an urban area, the population of the city was determined from a variety of sources. We have three population classifications: rural, not associated with a town larger than 10 000 people; small town, located in a town with 10 000 to 50 000 inhabitants; and urban, a city of more than 50 000. In addition to this general classification, for small towns and cities, the approximate population is provided.
The GRUMP Rural/Urban Extents is a gridded dataset with 43200 columns and 16800 rows covering -56S to 84N with a grid size of 0.008333 degrees (1/2 minute). The data set consists of a merging of two sources of information: population and settlement extents. Population data is derived from a variety of sources, primarily national census. Settlement extent is derived from the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) for a seven month period in 1994/1995, from an ESRI Digital Chart of the World (DCW), and Tactical Pilotage Charts (TPC). This is described in Methodologies to Improve Global Population Estimates in Urban and Rural Areas, Pozzi, Balk, Yetman, Nelson, Deichmann, 2003. More detailed discussion is available in The Distribution of People and the Dimension of Place: Methodologies to Improve the Global Estimation of Urban Extents, Balk, Pozzi, Yetman, Deichmann, Nelson, 2005
a Population
Population data were gathered primarily from official statistical offices (census data) and secondarily from other web sources, such as Gazetteer (www.gazetteer.de) and CityPop (www.citypop.de), or from specific individual databases when official statistical databases were not available. Based on the data available and applying UN growth rates, we estimated population in 1990, 1995, and 2000. In some cases, the records for cities and town included latitude and longitude coordinates. For those where coordinates were not available, we matched the settlement name and administrative units with the National Imagery and Mapping Agency (NIMA) database of populated places (gnswww.nima.mil/geonames/GNS/index.jsp).
The resulting database constitutes what we will call “points”.b Settlements extent
The physical extent of settlements has been derived both from raster and vector datasets, in
particular:
• Night-time lights, produced using time series data from the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) for the period 1 October 1994 to 30 April 1995, where the pixel values are measurements of the frequency with which lights were observed normalized by the total number of cloudfree observations. To delineate the physical extent of human settlements we used the World Stable Lights dataset (“cities” component).
• Digital Chart of the World (DCW)’s Populated Places: an ESRI product originally developed for the US Defense Mapping Agency (DMA) using DMA data and currently available at 1:1,000,000 scale (1993 version). The “populated places” coverage is available for most countries and contains depictions of the urbanized areas (built-up areas) of the world that are represented as polygons at 1:1,000,000 scale.
• Tactical Pilotage Charts (TPC): standard charts produced by the Australian Defense Imagery and Geospatial Organization, at a scale of 1:500,000, originally designed to provide an intermediate scale translation of cultural and terrain features for pilots/navigators flying at very low altitudes. Each chart contains information on cultural, drainage/hydrography, relief, distinctive vegetation, roads, sand ridges, power lines, and topographical features. Settlements are reported both as polygons and points. Polygons and points were digitized for a number of countries,
Method
The GRUMP data was retrieved as an ascii file from the GWP web site. A simple Perl script was used to loop through the station data in the GHCNv2 file, extract the latitude and longitude, and use them to locate the GRUMP undefined/rural/urban values (0,1,2) in the GRUMP ascii data file to determine rural/urban extents for each of the stations. A half-gridsize offset was applied to place the GRUMP grid points into the center of the each cell. In addition, similar scripts were used to parse through the GRUMP data and extract all the urban values for display.
The GRUMP urban extents in CONUS and some surrounding regions. are displayed against a black background to create a ‘faux’ brightness map. This is compared to the NOAA/DMSP brightness map for CONUS
http://www.ngdc.noaa.gov/dmsp/pres/low_light_120701/html/page4.html
The GRUMP urban extents for the world are displayed against a black background to create a ‘faux’ brightness map. This is compared to the NOAA/DMSP brightness map for world.

http://antwrp.gsfc.nasa.gov/apod/ap001127.html
In the GHCNv2 v2.temperature.inv file, there are 7280 stations. Of these, 1959 are marked “Urban”, 1409 marked as “Small Town”, 3912 marked as “Rural”, and 0 with no designation.
Calculating the GRUMP designators for the GHCN v2 stations, 3549 are marked “Urban” and 3249 are marked “Rural”. There are 482 undefined stations.
Comparing the GHCNv2 values with GRUMP extent we get a significant mismatch on the Rural/Urban designations.
GHCNv2 Urban, GRUMP Urban: 1673
GHCNv2 Urban, GRUMP Rural: 213
GHCNv2 Urban, GRUMP undef: 73
GHCNv2 Small, GRUMP Urban: 1032
GHCNv2 Small, GRUMP Rural: 324
GHCNv2 Small, GRUMP undef: 53
GHCNv2 Rural, GRUMP Urban: 844
GHCNv2 Rural, GRUMP Rural: 2712
GHCNv2 Rural, GRUMP undef: 356
In the following figure, the stations in which [GHCNv2=rural,GRUMP=urban] are marked red. The station in which [GHCNv2=urban,GRUMP=rural] are marked blue. Undefined stations are marked in green. All other GHCNv2 stations are marked yellow.
Remarks
I’m surprised that the GRUMP data set is using 15 year old DMSP imagery. DMSP OLS night light images are available up to 2008.
Roughly 5% of the GHCNv2 stations are undefined in the GRUMP extent data set. Possible source for errors lie in the latitude/longitude listed in GHCNv2, missing data in the GRUMP data set, resolution errors in GRUMP dataset, and gridding offset errors in the data lookup routines. Many of the undefined stations are located near water features and are likely ‘lost’ due to erroneous location designations within the water feature.
Excluding undefineds, 11% of the GHCNv2 urban designations and 23% of the GHCNv2 rural designations are not confirmed in the GRUMP data set.
References
Balk, Pozzi, Yetman, Deichmann, Nelson, The Distribution of People and the Dimension of Place: Methodologies to Improve the Global Estimation of Urban Extents, 2005
Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IFPRI); The World Bank; and Centro Internacional de Agricultura Tropical (CIAT). 2004. Global Rural-Urban Mapping Project (GRUMP), Alpha Version: Urban Extents. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Available at http://sedac.ciesin.columbia.edu/gpw 2010 Mar 13
Peterson, Vose, An Overview of the Global Historical Climatology Network Temperature Database, 1997.
Pozzi, Balk, Yetman, Nelson, Deichmann, Methodologies to Improve Global Population Estimates in Urban and Rural Areas, 2003.


One thing about working on your own posts, it leaves less time to read other blogs. Just saw Spencer’s latest UHI. Nice stuff. Seems to be relying on GPW for global pop data.
I don’t know if it’s just my eye, but it looks as if GRUMP is identifying more African urban extents than the satellite alone.
I agree.
More urban areas identified than by brightness alone – especially in the non-industrial world. And a filtering out of ‘thinly’ illuminated areas (North American highways). Some of this is due, again, to the use of ‘administrative zones’ to define the areas of interest. In other words, if I have this right, the polygons used to draw the urban areas are not defined by the satellite imagery – but by local national governments. The satellites and aerial imagery are used to help characterize a particular polygon as urban or rural – as is population density. Pop density is used directly where possible or modeled by algorithms which calculate a population density distribution based on the coarser population density provided by the government in that region for a lower resolution.
And keep in mind that UHI effects may occur 20 miles (downwind) from an urban centers.
Ultimately, the point of this exercise is to find locations most likely to be affected by UHI. Population density is just one marker of urbanization. Brightness is another. Urban extent is a third. Satellite land use categorization or $GDP/km^2 are others.
There are dense urban populations in much of the non–industrialized world which are not as energy intensive as industrialized suburbans. Areas in which the native vegetation has been replaced but in which concrete buildings and tar roads are not as common. American cities in the Western United States often have more trees and ‘green grass’ than the surrounding arid countryside.
Spencer’s UHI shows a negative response in some US population ranges. Real or Noise? If real, is it due to the fact that ‘suburbanization’ has a cooling effect? That the suburban vegetation is more cooling than the native landscape or the prior agriculture in that area? Just speculation.
Nice work.
With a combination of the google earth stations tour and this data you could build a really
cool application where somebody could tour the stations: inspect the GHCN data, look at a
nightlights picture for that site, look at the population data.
Any chance of posting a file for download with the Station id, lat/lon, pop data, etc?
In your mismatch files.. I assume this is a lat lon
43.95, 141.63 ?
And assuming thats 43.95 N,141.63E
Then yes you have a location there near the water.
There is a database that has a urban extent by the coastline. I think its used for
impact assesments due to Global warming.
Let me see if I can find it.
Just updated the station files linked above to include station ids.
Also, a slightly improved version of the stations map. (bigger points)
I’m planning on posting augmented versions of v2.temperature.inv and ushcn-v2-stations.txt. Not sure when yet.
GPW has a coastal urban extents file.
But the ‘coastal’ issue is more about ‘lost’ stations.
Three possible sources for errors:
1) bad lat/long in GHCN file
2) gridding issues on coastal boundaries in GPW/GRUMP
3) bad grid lookup in my code
#1 errors can be corrected by visual inspection of locations in google maps.
Ok,
That will be cool.
In the GHCN.inv the last char is either A/B/C
The readme.f doesnt say this is GHCN nightlights
but I recall an old file that had both GHCN nightlights and GHCN brightness index ..
Anyways, It will be nice when there is a metadata database. looking foward to it.
Ok, run a screen and screen out GHCN that are
CO or LA or A.
Selecting GHCN = RURAL, COAST =no, LAKE=no, Airport = NO, should get you 1975
rural stations that are not on the coast or near a lake or at an airport.
(by decoding the land use string.)
Then compare GHCN rural ( no coast, no lake, no airport) with GRUMP.
Could give you some kind of idea about how much is gridding issues on the coast and around lakes and airports.
If you apply the following screen
GHCN = rural
GHCN = No Coast, no lake, no airport
GRUMP = rural.
You end up with 1564 Stations.
Since the GRUMP urban extent is based on 1995 nightlights and 1993 data
I think you would want to insure that these places have remained Rural by using
more up to date data, either population, nightlights or both.
Mosh,
If you give me a list of station IDs, I’d be happy to run them through the model and see what temp trend they give, as well as compare them to other stations in the same grid cells
Better than that Zeke!
https://surfacestations.dabbledb.com
You can just select a view ( Grump Rural, no water no airport )
Go to the format options: select either xls or Csv and download the Inventory.
If you like I can add you as a admin and ron too.
basically if you create a File or a URL ( like ron did ) for a data file, Then it can be imported
and merged, provided you use the full GHCN id.
You need:
GHCN identifier ( I read it in as text )
New Field, New Field, new Field, etc etc
I did my a bit different than yours I keep the full GHCN identifier country+WMO+mod
and then I split it. you’ll see.
Anyways, If you post your stuff ( GHCN identifer, data,data, data)
The program will figure out the rest.
Just finished it this morning.
here’s a list:
GHCN=Rural
GRUMP =Rural
Coast = no
Airport = No
http://surfacestations.dabbledb.com/publish/surfacestations/a26e0aa6-10ff-4452-8e5e-7d186afb3ed7/zekelistnocoastnoairportruralrural.html
Ron,
You can see as well that the vast majority of GRUMPs that are unidentified are
either on the coast or by a lake.
Only 15 GRUMP unidentified are Coast=No, Lake =no.
http://surfacestations.dabbledb.com/publish/surfacestations/e8f59633-8856-42b4-a2f2-560ec3a82de0/grumpundefinednotcoastorlake.html
Mosh: here you go
http://i81.photobucket.com/albums/j237/hausfath/Picture189.png
1900-2009 trends:
All stations – 0.075
Mosh stations – 0.072
1960-2009 trends:
All stations – 0.202
Mosh stations – 0.232
Unfortunately, the rural, no coast, no lake datafile has urban areas in it. Gerryville, Algeria AKA El Bayadh, Algeria was one.
Air travel has gone from 100 billion passenger-kilometers in 1960 to 3 trillion passenger-kilometers in 2000 while global population went from 3 billion to 6 billion.
A 30 fold increase in air travel. Where are the majority of the thermometers?
Heh! dabbledb, huh.
Blocked at work.
kinda fun.
Anybody know what the definition of the A/B/C categorization of brightness is?
Thanks harry. I’m sure the screen I suggested is not perfect. The point is trying to use a screen to get a list of stations that can be scruntized in more detail. and then add to that list.
The other point is this: we may be able to find a UHI signal if we compare the best to the worst. I would expect to. We may also find that this differnce, while real, is negligible. In everything I’ve looked at its clear that the signal is NOT huge. I see nothing wrong with getting down to the end and stating that UHI is within the noise floor. Going in I expect it to be bigger than Jones .05/century, but not more than say ,15.
Na I looked for that a while back, maybe it come out of peterson 2003? I would ask them but my name is mud. fair enough.
Cool thanks. At some point I want to add density to the metadata. Can you m\post a list of ghcnid and associated density. I’ll add it in.
Ya it is kinda fun. quick and dirty. its cool too in that i can just click on a loaction and go to google
…sometimes
Link from the previous post:
http://rhinohide.org/rhinohide.cx/co2/ghcn/data/ghcn_station_inv_popdensity.txt
But that comes with the big caveat that popdensities are known to be too low in countries that don’t have good resolutions.
Also, I’m going to be redoing my method for data extraction. Been working with the java-netcdf code in order to extract data from the DMSP files – it should work well with the GPW stuff as well.
Mosh: slightly OT, but you should tell Anthony that rather silly articles like http://wattsupwiththat.com/2010/03/16/rewriting-the-decline/ really don’t help advance the debate… I don’t even want to try and guess what shape temp databases were in during the 70s before the WMO and NCDC decided to work on collecting and assembling records. Oddly enough, the worries over declining temps in the 70s might have provided some of the inspiration for the initial WMO efforts to collect station data (which were in the late 70s/early 80s iirc).
Look on page 1327 of this.
http://ams.allenpress.com/archive/1520-0477/89/9/pdf/i1520-0477-89-9-1325.pdf
Had nothing to do with adjustments; it was data cover.
See here.
http://data.giss.nasa.gov/gistemp/graphs/Fig.A3.lrg.gif
See the difference between NH and SH? At the time, there wasn’t good coverage in the SH.
Quality control is questionable at WUWT, as always.
Zeke,
I’ve been meaning to post on that article. After a couple paragraphs I just threw up my hands. I see no point in pulling up old graphs ( kinda like the Lamb diagram ) Perhaps I should go over and say something. I prefer to do my comments in public. On one occasion I have been asked [not by anthony] to review stuff [not by anthony] before it went up . my response was don’t publish. That advice was either not passed on or my vote wasnt enough. Since then I’ve been asked to review one other piece and I declined on principle. I’d rather make my comments public.
WRT Anthony’s editorial choices. His principles are not mine. Just from a external perspective
I think he seems quite happy to post many things that are on the edge and let his readers sort out the wheat from the chaff. That’s an interesting experience. I suppose you view of things is that WUWT is the national enquirer. Sometimes, however, the enquirer gets things right. This puts a huge burden on a reader like me. i’d like some of the chaff separated. but strangely, people are drawn to a site where there is an ACTIVE sorting of the wheat from the chaff. go figure. In the internet era people want to be a part of the editorial review. they dont trust the news unless they can talk back to it. maybe I’ll write about this phenomena.
let’s just say that by old standards Realclimate “should” have more credibility than any other site.
but it doesnt. why is that? well you can blame the readers and say they are ignorant, but I’m thinking it has more to do with this. Its the internet. people expect to be able to interact freely. they believe in the power of two way communication. Gosh I remember my first time on Slate
( I used to hang there with glenn reynolds) when the author of an article came down into the comments to argue with me. Damn. a conversation. I’m postulating that the internet changes our conception of how we come to truth or rather returning it to an experience rooted in dialog.
As irrational as this seems I trust gavin and judith curry more than someone like jones mreley because they dialog with their opponents ( gavin sometimes posts to other blogs) I trust Rob Wilson more than Briffa and Mann not at all. Part of this of course is reading that Mann has a policy of ignoring. And of course I trust you and ron and the guys at CCC more than Tamino.
Again, because you dialogue and share your work and TOOLS. those tools empower me.
If you share your tools ( code) you are sharing your power. That builds huge trust.
Finally, let me let you in on a little bit from behind the curtain in skeptic ville. ( luke warmers as well) On a few occasions I have made private suggestions to tall the notable “skeptics” things I would look at. result? ZERO.
these damn people just do what the hell they are interested in. there is another funny thing. Once they get interested in a thing they dont let go. This is just an observation. I think if people in the warmist camp understood that some of there opponents have “hobby horses” they could deal with them more effectively. If you view them as part of a conspiracy, then they just secretly laugh to themselves, because you got them wrong. When you characterize them [ not you zeke, I'm not saying you do this] as part of an organized effort then they just have contempt for you because you get them entirely wrong. then they ride that hobby horse harder and longer.
Thanks Ron.
Your early version of the density data is now up at
http://surfacestations.dabbledb.com/publish/surfacestations
Some notes:
For some stations you had a 9999 flag for density.
there were also stations ( maybe islands? ) where you had no data. I flagged those as 99999
I created a view for floks to look at or download: population density.
I also took the list I asked Zeke to run ( no coast, no airport, ruralghcn, rural grump) and sorted
it by density.
Looks like some work is required. I will add in GISS nightlights, but the problem there is GISS drops stations from GHCN. so I’d rather put in the source data..
However, until all the positions are fixed this is largely busy work.
For uploading a CVS file works great, as I can just point to a CVS file as a url.
but sometimes the tool thinks lines of data are just txt. like with a semicolon separator.
I can also read in column headers.
ghcnid,lat,lon,metadata
xxxxx,xx,xx,x
xxxxx,xx,xx,x
Missing data is ok, but just a pain
Mosh,
Ron should have his own version of nightlight data up shortly. The only issue is that the current scale used (0 to 63 for brightness) doesn’t correspond to the one used in the initial readings from the satellites at the time that the dark/dim/bright designations were created
I looked at the distribution and somewhat arbitrarily set 0-20 as Dark, 20-40 as Dim, and 40-63 as Bright, which seems to work okay, but we probably should try and figure out if there is a better way.
Ok,
I read in NASA nightlights ( from v2 ) And then to my shock relaized that the .inv file from GISSTEMP is POST step one, which means some stations have been added ( antartica ) and
I know that Gisstemp drops some strange stations.
Funny, I was looking at the nasa scale and saw numbers out past 1 hundred and I swore that
Imhoff 97 was scaled to 0-100. Anyways, it looks like they have different scales. The other
thing I thought of was its best to preserve the raw data and not bin it. Like trying to track down that GHCN designation ( A/B/C) nightmare for follow on analysts.
Anyways, lots of balls up in the air. jeffId has a nice post on anomalies and roman’s method
this stuff is way more fun than tree rings
pop density added back in after system restore in case you guys were looking for it during my crash. Their restore points are far apart, so I have to do manual back ups from now on.
For some reason I cant download the Nightlights source data. maybe they block me cause I’m not .edu or .gov?
no worries. Cluttering up my business laptop with this stuff is getting unweildy, so maybe I can justify a new system!
The files here?
http://www.ngdc.noaa.gov/dmsp/downloadV4composites.html
They are really big. About 300 mb compressed. Could well be a smart proxy server blocking you.
GISS v2.inv includes …
A/B/C with modifiers 1/2/3 (ie C_ or C1 are possible)
… and …
0-186
Ya ron those files. No joy for me. i’ll try again
works now.
J. Hansen, R. Ruedy, M. Sato, and K. Lo, 2010, Current GISS Global Surface Temperature Analysis (draft)
I like this quote from the paper
“Temperature records in the United States are especially prone to uncertainty”
Population by continent…Africa went from 366 million in 1970 to over 1 billion now.
Asia wen from 2 billion to 4 billion. US population went from 200 million to 300 million. Europe is pretty much flat.
https://www.census.gov/compendia/statab/2010/tables/10s1294.xls
http://www.fao.org/docrep/009/a0310e/A0310E06.htm#ch3
Thanks Ron. It was good reading.
The great thing in my mind is that now people have some tools in their hands to do their own
hypothesis testing. either with the CCC version of GISSTEMP, or Zeke’s stuff or CRU’s approach.
JeffId is also working up a version that should be interesting.
As people find that these methods give largely the same answer, questions about method
will focus in on issues that are incompletely described ( in my mind)
1. Issues with how GISS handles the arctic. ( Tilo had an interesting post on this )
This is one of those “improvements” that doesnt change the big answer but just
lookas more closely at the difference ( small) between CRU and GISSTEMP
2. How averaging is done for grids that are over land and water. CRU and GISSTEMP
do it differently. Again, an area for improvement without substantial change.
3. Various approaches to spatial averaging. ( improvements that dont change the final answer)
4. calculations of error due. to sampling ( spatial coverage ) Improvements that communicate better the level of certainty in the data.
Next, are the issues of metadata:
1. Having accurate locations for sites ( besides USHCN)
2. A good set of alternative proxies for urban/rural studies.
And then the issues of raw data and its adjustment:
1. TOBS adjustments in the US and ROW
2. ADjustments for instrument changes.
3. microsite degredation. ( some interesting posts on Lucia’s site )
Interesting. Good description of the benefits and limitations of each approach. I bet we will see some new pop density databases soon as high-res satellite imaging and better object-identification algorithms are developed.
Heck, back in 2005 when that report was written Google Earth had just come out :-p