2010 June 3

Data Monger

Since my laptop is on the brink, I’ve been shifting my data processing from my Sun VBox over to the arm processor Sheeva Plugin (aka wall-wart). Previously, I’ve used this mostly as a file server front end, connected to a “My Book.” But I’ve been able to get most of my dev cycle working on it. “R”, “gdal-bin”, “open-java” are all running – although it takes about 2 minutes for the jvm to launch – suitable for scripting but not great for development.

Quick dip back in the gene pool to grab some GPW population data. GPW data was downloaded as 2.5′ resolution BIL files and converted to GeoTiffs with gdal_translate. These were read by the GeoTiff data reader and grid values extracted for each GHCN station. Some GHCN station names have been changed to protect the innocent remove commas.

This is a CSV file with the 5 year population data from GPW (and a rural/urban flag) for each GHCN station.

This is a CSV file with the 5 year population density data from GPW (and a rural/urban flag) for each GHCN station.

This is a CSV file with yearly DMSP OLS v4 data for each GHCN station.

I’ve repackaged the ERSSTv3b data into a pseudo-GHCN format. There is a station for every grid point (2deg grid) with monthly data.

Next up, per requests, probably process some more sea based data.
And still working through the automated build of the GSOD data

A quick note on the naming convention. I’m not very consistent with this, but more or less as follows …
my = my hands were the last on the data
ghcn = ghcn v2 format for inv and mean, or ghcn station meta data
xxx[.xxx] = other data source being used
[inv|mean] = type of ghcn data set
[txt|csv] = fixed field or csv
[gz] = gzip

  1. carrot eater
    2010 June 3 at 6:25 pm

    You’re losing it.

  2. steven Mosher
    2010 June 3 at 9:30 pm

    those frickin commas in the names drove me batty. Also, in some NOAA files they use # sign

    I was thinking it might be nice to use standard country names and country numbers (fips) and
    get clean place names from geo names.. I spent (wasted) some time trying to replicate the merging of USHCN inventories with GHCN inventories ( giss has done the work but I wanted to see if it could be done from sources.. ugly )

  3. carrot eater
    2010 June 4 at 12:02 am

    The USHCN-GHCN merge step is something less than elegant.

  4. steven Mosher
    2010 June 4 at 2:24 am

    ya carrot, even at the level of the inventories.

    you take a ushcn inventory.. no ghcnidentifier, its a ushcn identifier. match that to the Ghcn inventory?

    1. the lat lons are different
    2. the names can be different.
    3. doing it by “distance gets you about 95% of the way there.

    no climate science there.

  5. 2010 June 4 at 4:26 am

    I’ve been thinking the same for the gsod/ghcn data. Undoubtedly most of the ghcn climat stns are also in the gsod inventory. Fortunately, I have the wmo id to work from when I want to tackle this. But then there is the ushcn/gsod station matches …

  6. 2010 June 4 at 4:31 am

  7. steven Mosher
    2010 June 4 at 10:02 am

    Ron, Maybe you could ask the Noaa folks if they have a concordance of sorts.

    I’ve started with the COOP inventory.. I was going to look at the fsod, and the master inventory,

    Now Giss have done the work ( they have a table for converting from the ushcn identifier to the ghcn identifier) but it would be nice to be able to generate it from Noaa sources. Or perhaps the clearclimatecode guys can ask GISS for documentation on the translation tables.. these tables are just used as input on step1. Its all book keeping crap.

    Ideally, then when this stage is all done I think we can show that the work of GISS is all
    replicatable from sources. that their algorithms do not introduce any substantial bias ( see nick and zeke and chads work).

    That, in my mind leaves two open issues:

    1. The quality and accuracy of the metadata ( good independent work on your side)
    2. The adjustment algorithms.

  8. carrot eater
    2010 June 4 at 5:09 pm

    I think it’s time for you to take a vacation or something.

