Home > GHCN > GHCN Station Inventory: A Reconstruction

GHCN Station Inventory: A Reconstruction

2010 May 16

Introduction

GHCNv2 was released in 1997. The climate data included a ‘station inventory’ file with several metadata fields designed to provide more additional data about the conditions surrounding the stations. Much of this metadata was extracted by manual interpretation of Operational Navigation Charts (ONC). A sample is shown below. During the intervening years, GIS tools and data sets have become readily available. These tools and datasets are used to examine alternate sources for the metadata provided in the GHCNv2 station inventory. An initial station inventory reconstruction based on these data sources is provided.

A sample of an ONC is shown below with many of the features used to derive GHCNv2 metadata including elevation, populations, topography, marshes, vegetation, coasts, and airports.

Sample ONC

Interpolated Elevation

GHCN has two elevation fields. The first is the original station elevation in meters (na string is -999). The second is the station elevation interpolated from TerrainBase gridded data set.

ETOPO01 Elevation

Like most station databases, these metadata start off with station name, latitude, longitude, and elevation. Wherever possible, these were obtained from the current WMO station listings (WMO 1996b). Some stations in GHCN do not have elevation metadata. To provide all stations with some elevation information, an elevation value was interpolated to the station location from a 5-min gridded elevation database (Row and Hastings 1994) and this elevation is provided in addition to official station elevations. In areas with significant orography, these interpolated metadata will have limited specific accuracy. But they can provide useful information about the station’s elevation.

Peterson and Vose, 1997

The TerrainBase Global Land Elevation and Ocean Depth is the data set described by Peterson and Vose in 1997 as they GHCN interpolated elevation data set.It is available for download here” http://dss.ucar.edu/datasets/ds759.2/matrix.html (registration required). However, I found a better match between the GHCN original and interpolated elevation fields with the ETOPO1 data than I did with the TerrainBase data.

The ETOPO01 data is available in “grid registered” and “cell registered” versions. The difference is one I’ve alluded to in previous posts. If you think of the geotiff data as a checkerboard table, then the “grid registered” centers the lat,lon in the center of the the cell while “cell registered” places the lat,lon on the intersections.
http://www.ngdc.noaa.gov/mgg/global/global.html

Standard Deviation = 146m (interpolation, no offset)

Population and Rural/Urban Extents

The GHCN inventory reader program describes these fields as follows:
c pop=1 character population assessment: R = rural (not associated
c with a town of >10,000 population), S = associated with a small
c town (10,000-50,000), U = associated with an urban area (>50,000)
c ipop=population of the small town or urban area (needs to be multiplied
c by 1,000). If rural, no analysis: -9.

Gridded Population

Population. Examining the station location on an ONC would determine whether the station was in a rural or urban area. If it was an urban area, the population of the city was determined from a variety of sources. We have three population classifications: rural, not associated with a town larger than 10 000 people; small town, located in a town with 10 000 to 50 000 inhabitants; and urban, a city of more than 50 000. In addition to this general classification, for small towns and cities, the approximate population is provided.

These population metadata represent a valuable tool for climate analysis; however, the user must bear in mind the limitations of these metadata. While we used the most recent ONC available, in some cases the charts or the information used to create the charts were compiled a decade ago or even earlier. In such cases the urban boundaries in rapidly growing areas were no longer accurate. The same is true for the urban populations. Wherever possible, we used population data from the then-current United Nations Demographic Yearbook (United Nations 1993). Unfortunately, onlycities of 100 000 or more inhabitants were listed in the yearbook. For smaller cities we used population data from several recent atlases. Again, although the atlases were recent, we do not know the date of source of the data that went into creating the atlases. Additionally, this represents only one moment in time; an urban station of today may have been on a farm 50 years ago, though it is probably valid to assume that if a station is designated rural now, it was most likely rural 50 years ago. Knowing the importance of avoiding the effect of urban warming by preferring rural stations in climate analysis, these population metadata have been used as one of the criteria in the initial selection of the Global Climate Observing System (GCOS) Surface Network (Peterson et al. 1997a).

Peterson and Vose, 1997

GPW GRUMP Rural/Urban Extents

Resolution is 30 second. Values are 0=undef, 1=rural, 2=urban. The downloaded ASCII files are converted to GeoTiff with gdal_translaate. Data is located here:
http://sedac.ciesin.columbia.edu/gpw/global.jsp

There are 3912 rural stations, 1409 suburban stations, and 1959 urban stations in the GHCN station inventory. I use the term ‘suburban’ as a synomyn for ‘small town.’

GHCN-GRUMP
Rural-Rural matches: 3061 / 3912 => 78%
Urban-Urban matches: 1695 / 1959 => 87%

GPW GRUMP Population

Resolution is 30 seconds. Values are for the population count in the grid cell. Coverages is from 84N to 56S. The range of population in the GeoTiff is from -32767 to 32767(?). The range of the population in the station inventory is from 0-24187. There are 42 stations that have ‘na’ (-9), most of these are in Antarctica, outside the coverage, but there are three other cases: ”Abbaissia/Cairo HQ’, Dhaka’, and ‘Royal Observa’. There is no obvious method for comparing the gridded population with the GHCN ‘population of nearest town.’

Topography

Among the metadata in the GHCN station inventory file is a topographical landform classification of four types: flat (FL), hilly (HI), mountains (MV), and mountain tops (MT).

Meybeck 2001

Topography. ONC make detailed orography available to pilots. We used this information to classify the topography around the station as flat, hilly, or mountainous. Additionally we differentiated between mountain valley stations and the few mountaintop stations that can provide unique insights into the climate of their regions

Peterson and Vose, 1997

The Meybeck classification is downloadable as a global JPEG image. The resolution is roughly 1/10th of a degree. The EUSOILS web site does offer a much higher resolution image through their ‘terrain viewer’ – but not as a single image.

A image processing program GIMP was used to open the image, crop the white space border, convert to greyscale and save to a new tiff file. Initial values for the line of the west longitude, the latitude range, and a pixel resolution of 1/10th degree were estimated. This information was used to build a ‘tfw’ file which was processed by gdal_translate to create a GeoTiff file. Four islands on the greyscale image were used for calibration: Jarvis Island (W), Lord Howe Island (E), and unlabeled islands off the coasts of Antartica (S) and Svalbard (N). These were used to adjust the metatags for the GeoTiff conversion.

GHCN contains both “Mountain Top” (MT) and “Mountain Valley” (MV) classifications. However, there are only 61 of the MT types and for the purposes of this exercise, I have converted those to MV types.

TOPO GHCN MEYBECK MATCH
FL   2779  1326   48%
HI   3006  1705   57%
MV   1495  1049   70%
------------------------
     7280  4080   56%

Station Vegetation

The GHCN v2 fortran reader comments:
c stveg=general vegetation near the station based on Operational
c Navigation Charts; MA marsh; FO forested; IC ice; DE desert;
c CL clear or open;
c not all stations have this information in which case: xx.

NEO LC

Vegetation. If the station is rural, the vegetation for that location is documented. The classifications used on the ONC are forested, clear or open, marsh, ice, and desert. Not all ONC had complete vegetation data, so these metadata are not available for all stations. An additional source of vegetation data is included in GHCN metadata: the vegetation listed at the nearest grid point to each station in a 0.5° ´ 0.5° gridded vegetation dataset (Olson et al. 1983). This vegetation database creates a global vegetation map of 44 different land ecosystem complexes comprising seven broad groups. These metadata do not indicate the exact vegetation type at the station location, but they do provide useful information. In particular, an ecosystem classification can be used to some degree as a surrogate for climate regions since vegetation classes depend, to a large extent, on climate.

Peterson and Vose, 1997

In fact, there are no stations labeled “CL” in the GHCN v2 station inventory file (v2.temperature.inv) and less than 10% of the stations have any ONC vegetation label. As far as a ‘reconstruction’ of the station inventory goes, simply marking all stations with an ‘xx’ category achieves a 91% match rate.

NASA Earth Observations: Land Cover Classification (1 year – Terra/MODIS)

Available on the NASA Earth Observations site. Select the GeoTIFF option from the right hand corner. This data set uses the same IGBP land classification categories listed in the IGBP section immediately above.

Number of stations marked with listed vegetation type:

Match GHCN ...... IGBP
68% . 0456 desert 0312 (cat 7,16)
53% . 0079 forest 0042 (cat 1,2,3,4,5)
06% . 0070 marsh. 0004 (cat 11)
62% . 0039 ice... 0024 (cat 15)

Coastal and Coastal Distance

GHCN v2 includes 2 coastal fields: a two character string describing the coastal type (coastal, lake, island, or ‘no’) and a two character field giving the distance to the coast if it is 30km or less (-9 for na).

Spatial-Analyst.net Distance from Coast
Distance from Coast | http://spatial-analyst.net

Coastal locations. Oceanic influence on climate can be significant, so these metadata include (a) if the station is located on an island of less than 100 km2 or less than 10 km in width at the station location, (b) if the station is located within 30 km of the coast it is labeled as coastal and the distance to the coast is provided, and (c) if the station is adjacent to a large (greater than 25 km2) lake, that too is noted because it can have an influence on a station’s climate

Peterson and Vose, 1997

Once again, there is a good match-up** between the GIS data and the GHCN metadata. Of the 7280 station in the inventory, 4532 are listed as ‘not coastal’ in both data sets and 2072 are listed as ‘coastal’ in both data sets (defined as 30km or less from coast). 108 station are coastal in the GHCN data but not in the DCOAST data. 125 stations are coastal in DCOAST but not in GHCN. In addition, there are 443 lake stations (LA) in the GHCN data that are not addressed in this post. I calculate a (119+117)/7280 mismatch rate of 3.2% which leads to a 96.8% match rate.

Airports and Airport Distance

The GHCN v2 fortran reader comments:
c airstn=A if the station is at an airport; otherwise x
c itowndis=the distance in km from the airport to its associated
c small town or urban center (not relevant for rural airports
c or non airport stations in which case: -9)

Airport locations. Airports are, of course, clearly marked on ONC charts. If a station is located at an airport, this information along with the distance from its associated city or small town (if present) are included as part of GHCN metadata.

Peterson and Vose, 1997

DAFIF or the Digital Aeronautical Flight Information File is a complete and comprehensive database of up-to-date aeronautical data, including information on airports, airways, airspaces, navigation data and other facts relevant to flying in the entire world, managed by the National Geospatial-Intelligence Agency (NGA).

http://en.wikipedia.org/wiki/DAFIF

Once publicly available, the US government withdrew public distribution in 2006 based on potential intellectual property claims. One open source for the redistribution of the older data is the Pacific Disaster Center (airports_dafif.zip). This data is in ARC vector format (shape files) which I do not yet know how to handle. Fortunately, spatial-analyst.net has a GeoTiff file (airports.zip) prepared from the shape file format.

Resolution is 1/20th degree (0.05 deg). Airports are presented as one of four classes:

Cat 1 : Active civil airports controlled and operated by civil authority primarily for use by civil aircraft

Cat 2 : Airports jointly controlled, used and/or operated by both civil and military agencies

Cat 3 : Active military airports controlled and operated by military authorities primarily
for use by military aircraft

Cat 4 : Active airports having permanent type surface runways with less than the minimum facilities required for A, B, or C airports above

             DAFIF GHCN
GHCN AIRPORTS 1049 2390 44%
GHCN NO AIRPT 4620 4890 95%
-----------------------------
              5669 7280 78%

Gridded Vegetation Index (Olson)

The Olson gridded vegetation is available in its original format at the CDIAC data repository.

Vegetation. If the station is rural, the vegetation for that location is documented. The classifications used on the ONC are forested, clear or open, marsh, ice, and desert. Not all ONC had complete vegetation data, so these metadata are not available for all stations. An additional source of vegetation data is included in GHCN metadata: the vegetation listed at the nearest grid point to each station in a 0.5° ´ 0.5° gridded vegetation dataset (Olson et al. 1983). This vegetation database creates a global vegetation map of 44 different land ecosystem complexes comprising seven broad groups. These metadata do not indicate the exact vegetation type at the station location, but they do provide useful information. In particular, an ecosystem classification can be used to some degree as a surrogate for climate regions since vegetation classes depend, to a large extent, on climate.

Peterson and Vose, 1997

The Olson World Ecosystem Complexes data is available from the Carbon Dioxide Information Analysis Center at the Oak Ridge National Laboratories. The dataset has a long history with portions of it originating from a Jerry Olson project from 1970.
http://cdiac.ornl.gov/epubs/ndp/ndp017/ndp017.html

The Olson Ecosystem NDP data is available in two formats: a table formatted as an ARC/INFO interchange file (*.e00) and a long format where each row represents one lat/long cell (ndp017_g.dat). This later data file included the ecosystem label in the lat/lon row data. A custom Java data reader class was used to read for the long file. The data resolution is 30′.

Match between GHCN and Whiteboard station gridded vegetation types:
Match: 6885
Mismatch: 395
94.5% match rate

Olson Vegetation GHCN Count Whiteboard Count
ANTARCTICA 16 17
BOGS, BOG WOODS 19 19
COASTAL EDGES 127 131
COLD IRRIGATED 1 0
COOL CONIFER 363 365
COOL CROPS 380 376
COOL DESERT 81 82
COOL FIELD/WOODS 113 116
COOL FOR./FIELD 273 279
COOL GRASS/SHRUB 271 271
COOL IRRIGATED 36 40
COOL MIXED 162 155
EQ. EVERGREEN 50 46
E. SOUTH. TAIGA 12 12
HEATHS, MOORS 19 19
HIGHLAND SHRUB 127 128
HOT DESERT 161 156
ICE 8 9
LOW SCRUB 2 2
MAIN TAIGA 134 133
MARSH, SWAMP 60 60
MED. GRAZING 121 123
NORTH. TAIGA 64 63
PADDYLANDS 169 172
POLAR DESERT 7 5
SAND DESERT 41 42
SEMIARID WOODS 36 35
SIBERIAN PARKS 9 8
SOUTH. TAIGA 38 35
SUCCULENT THORNS 58 59
TROPICAL DRY FOR 107 104
TROP. MONTANE 61 62
TROP. SAVANNA 140 135
TROP. SEASONAL 108 109
TUNDRA 147 152
WARM CONIFER 63 61
WARM CROPS 967 979
WARM DECIDUOUS 149 144
WARM FIELD WOODS 305 302
WARM FOR./FIELD 381 386
WARM GRASS/SHRUB 592 600
WARM IRRIGATED 101 98
WARM MIXED 175 173
WATER 999 988
WOODED TUNDRA 27 29

GHCN A/B/C and GISS Brightness Index

The original GHCN v2 station inventory did not include brightness data. GHCN has included a three value (A,B,C) indicator of brightness (dark, dim, bright). GISS has included an additional field for the DMSP Radiative Calibrated data encoded in world_ave.tif.

DMSP

The Defense Meteorological Satellite Program (DMSP) is a Department of Defense (DOD) program run by the Air Force Space and Missile Systems Center (SMC). The DMSP program designs, builds, launches, and maintains several near-polar orbiting, sun synchronous satellites, monitoring the meteorological, oceanographic, and solar-terrestrial physics environments. DMSP satellites are in a near-polar, sun synchronous orbit at an altitude of approximately 830 kilometers (km) above the earth. Each satellite crosses any point on the earth up to two times a day and has an orbital period of about 101 minutes, thus providing nearly complete global coverage of clouds every six hours.

http://www.ncdc.noaa.gov/oa/usgcos/documents/soc_long.pdf

The binning match rate for the GHCN Brightness is as follows:
world_stable_lights.tif (offset required, reso=0.00833, max value=100)
'A' 'B' 'C' total match [a,c]
GHCN: 2169 826 2854 5849 80% [1,60]

The standard deviation for the Brightness Index as published in the GISS v2.inv file and as read from the world_ave.tif is 17.5 W/m^2.

Results

The following files station inventories were prepared using the data sources described above:

v2.ghcn.compare.inv-latest.txt
v2.ghcn.inv-latest.txt
v2.ghcn.inv-latest.csv

v2.giss.compare.inv-latest.txt
v2.giss.inv-latest.txt
v2.giss.inv-latest.csv

Discussion

The Operational Navigation Charts (ONC) provided an interesting data source with near global coverage for the GHCN metadata. However, the manual intrepretation needed to convert the chart data into digital format makes it difficult to review the accuracy of the metadata and seems almost quaint in this era of satellite imagery and GIS data processing. More importantly, it makes it difficult to extend the GHCNv2 station inventory with additioanal weather data sources.

Alternate digital data sources are publicly available for most of the data fields that were not originally derived from digital fields. One important exception is the incomplete Airport listing provided by DAFIF which may not include historical information about locations which are no longer working airports but were at the time of the weather data collection.

This project has exposed the importance of accurate location information for the station inventory for the purposes of obtaining information from alternate data sources. Much satellite data is available in 30 second resolution. Many stations locations seem to be no more accurate than 0.1 degree.

References

Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IFPRI); The World Bank; and Centro Internacional de Agricultura Tropical (CIAT). 2004. Global Rural-Urban Mapping Project (GRUMP), Alpha Version: Population Grids, Population Density Grids. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Available at http://sedac.ciesin.columbia.edu/gpw

Elvidge, et al, 2003 Preliminary Results From Nighttime Lights Change Detection

European Commission – Joint Research Centre
Institute for Environment and Sustainability
http://eusoils.jrc.ec.europa.eu/projects/landform/

Gallow, Owen, Easterling, Jamason, 1999, Temperature Trends of the U.S. Historical Climatology Network Based on Satellite-Designated Land Use/Land Cover

GHCN v2.read.inv.f, source code for the GHCN station inventory reader

Hansen, et al, 2001, A closer look at United States and global surface temperature
change

Hansen, et al, 2010, Current GISS Global Surface Temperature Analysis

Tomislav Hengl, http://spatial-analyst.net

Image and Data processing by NOAA’s National Geophysical Data Center.
DMSP data collected by the US Air Force Weather Agency.

Marc L.Imhoff, William T. Lawrence, David C. Stutzer, Christopher D. Elvidge, 1997, A technique for using composite DMSP/OLS “City Lights” satellite data to map urban area

Meybeck, M., P. Green and C. J. Vorosmarty (2001), A New Typology for Mountains and Other Relief Classes: An Application to Global Continental Water Resources and Population Distribution, Mount. Res. Dev., 21, 34 – 45.

Peterson and Vose, 1997, An Overview of the Global Historical Climatology Network Temperature Database

See also Nighttime Lights Change Detection (Slide Presentation)

Advertisements
  1. 2010 May 16 at 8:18 am

    Free at last! Free at last!
    Thank God Almighty, Free at last!

    That took longer than I expected. 😉

  2. carrot eater
    2010 May 16 at 10:18 am

    So now what? Would you say you’re pretty much done with the project?

    Are the NCDC guys aware you’re working on this? Maybe they might adopt some of your sources?

  3. 2010 May 16 at 11:59 am

    Oh yeah, I have something else coming, I left a hint ;-)…. “More importantly, it makes it difficult to extend the GHCNv2 station inventory with additional weather data sources.”

    As to using “my” sources, I think the lesson here is not to bother. If a newbie can go out and collect this information (and update it as he goes along), there is really no reason for NOAA/NCDC to bother doing it. Any competent science team will be able to get the data they want. And anything NOAA/NCDC publishes will likely be out-of-date in a couple of years anyway.

    What is more important is for NOAA/NCDC to ensure the accurate location information for their chosen data sets. And if they want to provide ‘metadata’, do it on station information that can’t be gleaned from maps or sat data. Historical airports, “surface-station” type data, wx station instrumentation, etc …

  4. steven Mosher
    2010 May 16 at 12:53 pm

    Great work Ron,

    I was looking into geonames.org to get better more detailed place information ( like country codes and admininstration level codes ( like state and province) but hit a bug in R. Also
    geonames.org has pointers to more accurate elevation data ( Shuttle mission data for 60S to 60 N)

  5. carrot eater
    2010 May 16 at 1:03 pm

    Fair enough on all your points, but it is nice to have somebody else set it up and put it all in one place. After all, it took you some effort to get to this point.

    and i hope you send a note to NCDC – I think they’ll be impressed with your effort.

    Agreed on needing accurate locations. maybe something to suggest to them.

  6. steven Mosher
    2010 May 16 at 2:05 pm

    ya carrot, it would be cool to have a place were one could build the metadata from sources
    ( rather than download a stale file from Giss or noaa) compare that fresh metadata to the stored version. generate a *inv file. load up the ccc version of Gisstemp and run. or load up Nicks code and run. All from public sources all open source.

  7. 2010 May 17 at 2:11 pm

    I’ve definitely pointed some NCDC folks your way, though they tend not to comment on blogs.

    This has been a great resource to use for analysis, and hopefully we can work to turn it into a paper down the road.

  8. steven Mosher
    2010 May 18 at 8:39 am

    That’s great Zeke. I wouldn’t expect them to comment. I do think you have an excellent angle for a blog post. My thought has been that if the work can be reconstructed from sources, independently, you had a good story to tell. With Ron’s work you are a lot closer to that goal. Its also a good story when citizens contribute to the effort and when ‘scientists’ take notice and actually use the work, even if its a small contribution like making things neater, or more accessible or more up to date.

    Also, there are probably other folks who might want to help in their areas of expertise.

  1. 2010 May 23 at 7:23 pm
  2. 2010 May 27 at 9:06 am
  3. 2010 June 26 at 1:39 pm
  4. 2010 August 30 at 4:52 am
Comments are closed.