Yesterday I posted the simple unsupervised cluster analysis of USHCN station 051294, Canon City, CO. It neatly divided into two classes which seemed to make a good match with the visible appearance of man-made features -v- vegetated landscape with a few bare ground spots thrown in to the man-made side of the classification.
But it doesn’t always work so easily. Before I tried USHCN 051294, I did a test run on station picked at random, USHCN 199316, which is West Medway, Massachusetts.
Dividing this station into two classifications, similar to the exercise yesterday, and we see that many areas within the natural vegetation are marked the same as obviously man-made areas. Indeed, this image is divided into almost equally into two classes: 51.37% and 48.63%
So we take the clustering code we looked at yesterday, which divided the image into two classes, and divide it into 3 classes instead.
i.cluster group=51294_18 subgroup=51294_18 classes=3 sigfile=51294_18_sig.txt reportfile=51294_18_rpt.txt
i.maxlik group=51294_18 subgroup=51294_18 sigfile=51294_18_sig.txt class=51294_18c_class reject=51294_18c_reject
Taking a quick look and we see that it looks much more like we expect for a natural -v- man-made division. Most of the man-made stuff is in class 3. But it’s not perfect. The house in the lower left corner is mostly classed “natural” while the surrounding lawn is classed “man-made.” Note: the unsupervised classification is NOT dividing things by any predefined ‘natural’ or ‘man-made’ clusters – its just that the cluster analysis of the location of certain colors tends to lump MOST man-made landscapes into a single class – given the right number of classes.
Running the stats on the 3×3 smoothed version, we get these three classification coverages:
Which seems about right.
A confounding factor in the above is the presence of dark shingles and shaded trees which works against a clean separation of classes.
GRASS is the common name for Geographic Resources Analysis Support System. It has a huge GIS (Geographic Information Systems) toolkit and is under constant development. I’ve been meaning to take it out on a spin for some cluster analysis and surface classification. Better late than never.
A couple of months ago, I used RGoogleMaps to automate the download of some USHCN sites from Google Maps. We can use GRASS tools to classify the various pixel colors into an arbitrary set of classes using cluster analysis. I’ll use the same image, Canon City CO, that I used in the earlier post.
GHCNv2 was released in 1997. The climate data included a ‘station inventory’ file with several metadata fields designed to provide more additional data about the conditions surrounding the stations. Much of this metadata was extracted by manual interpretation of Operational Navigation Charts (ONC). A sample is shown below. During the intervening years, GIS tools and data sets have become readily available. These tools and datasets are used to examine alternate sources for the metadata provided in the GHCNv2 station inventory. An initial station inventory reconstruction based on these data sources is provided.
Similar to the previous post, its time to review a previous thread, GHCNv2 and GRUMP Rural and Urban Extents, in regard to the GPWv3 GRUMP data for rural/urban extants in the context of the GHCN rural/smalltown/urban (R/S/U) classification.
Population. Examining the station location on an ONC would determine whether the station was in a rural or urban area. If it was an urban area, the population of the city was determined from a variety of sources. We have three population classifications: rural, not associated with a town larger than 10 000 people; small town, located in a town with 10 000 to 50 000 inhabitants; and urban, a city of more than 50 000. In addition to this general classification, for small towns and cities, the approximate population is provided.
These population metadata represent a valuable tool for climate analysis; however, the user must bear in mind the limitations of these metadata. While we used the most recent ONC available, in some cases the charts or the information used to create the charts were compiled a decade ago or even earlier. In such cases the urban boundaries in rapidly growing areas were no longer accurate. The same is true for the urban populations. Wherever possible, we used population data from the then-current United Nations Demographic Yearbook (United Nations 1993). Unfortunately, onlycities of 100 000 or more inhabitants were listed in the yearbook. For smaller cities we used population data from several recent atlases. Again, although the atlases were recent, we do not know the date of source of the data that went into creating the atlases. Additionally, this represents only one moment in time; an urban station of today may have been on a farm 50 years ago, though it is probably valid to assume that if a station is designated rural now, it was most likely rural 50 years ago. Knowing the importance of avoiding the effect of urban warming by preferring rural stations in climate analysis, these population metadata have been used as one of the criteria in the initial selection of the Global Climate Observing System (GCOS) Surface Network (Peterson et al. 1997a).
My original post on the ‘brightness’ fields, DMSP: The Stars at Night, They are so Bright …, looked at the DMSP satellite ‘night light’ brightness data as used in GHCN and GISS. The brightness fields were not part of the original GHCN v2 metadata. GHCN adds an A/B/C indicator of brightness. GISS includes that but also adds a numerical value. The GISS value is derived from the DMSP “Radiative Calibrated”, a single data set prepared from data collected 1996-1997
Airports are getting increasing attention from those looking at surface-records as they have become an increasing fraction of the currently reporting weather/climate stations.
Airport locations. Airports are, of course, clearly marked on ONC charts. If a station is located at an airport, this information along with the distance from its associated city or small town (if present) are included as part of GHCN metadata.
Among the metadata in the GHCN station inventory file is a topographical landform classification of four types: flat (FL), hilly (HI), mountains (MV), and mountain tops (MT).
Topography. ONC make detailed orography available to pilots. We used this information to classify the topography around the station as flat, hilly, or mountainous. Additionally we differentiated between mountain valley stations and the few mountaintop stations that can provide unique insights into the climate of their regions
GHCN has two elevation fields. The first is the original station elevation in meters (na string is -999). The second is the station elevation interpolated from TerrainBase gridded data set.
Like most station databases, these metadata start off with station name, latitude, longitude, and elevation. Wherever possible, these were obtained from the current WMO station listings (WMO 1996b). Some stations in GHCN do not have elevation metadata. To provide all stations with some elevation information, an elevation value was interpolated to the station location from a 5-min gridded elevation database (Row and Hastings 1994) and this elevation is provided in addition to official station elevations. In areas with significant orography, these interpolated metadata will have limited specific accuracy. But they can provide useful information about the station’s elevation.
The GHCN station inventory includes two vegetative descriptions. The first is a two character marker stveg which is described as “general vegetation near the station based on Operational Navigation Charts; MA marsh; FO forested; IC ice; DE desert; CL clear or open;”. The second is grveg which is described as “gridded vegetation for the 0.5×0.5 degree grid point closest to the station from a gridded vegetation data base.” The gridded vegetation is derived directly from the Olson World Ecosystem Complexes data.
I looked at the Olson gridded vegetation in a previous post Olson Ecosystem Complex: Grilled Veggies. In this post, I am looking at how various “Land Cover” data sets match up with the GHCN ONC ‘stveg’ data field.
GHCN v2 includes 2 coastal fields: a two character string describing the coastal type (coastal, lake, island, or ‘no’) and a two character field giving the distance to the coast if it is 30km or less (-9 for na). GHCN has this to say about the coastal location.
Coastal locations. Oceanic influence on climate can be significant, so these metadata include (a) if the station is located on an island of less than 100 km2 or less than 10 km in width at the station location, (b) if the station is located within 30 km of the coast it is labeled as coastal and the distance to the coast is provided, and (c) if the station is adjacent to a large (greater than 25 km2) lake, that too is noted because it can have an influence on a station’s climate
Peterson and Vose, 1997, An Overview of the Global Historical Climatology Network Temperature Database