Home > HadSST, trb > trb-0.12: HadSST2 Oceans

trb-0.12: HadSST2 Oceans

2010 August 10

Introduction

Interdit by chile:chile
Interdit by chile:chile

Notes
Not really an update to methods as much as incorporating a new product.

I’m using the NetCDF for HadSST. I still have an issue that in my yearly average that if there is a single NA, it drops the entire year for the cell. Just noted it before posting. Marked it for a fix.

The HadSST is a pre-processed data set combining sea surface records in ICOADS with NCEP-GTS satellite data from 1998 to the present.

CRU and UKMET/HAD process this data set differently. From the FAQ: “We use the same gridded data but the way that CRU calculate the global annual average and the way that we do it are different. We average in time (monthly maps to annual maps) then in space (area average annual map to get a single number). CRU average in space (monthly map to single monthly average number) then in time (12 single monthly global averages average to get one single global annual average number)”

In the following graphs, I am using my mean of weighted latitudes. UKMET/Had uses something slightly different. Simple enough to implement. On my to-do. Again from the FAQ: “First the monthly anomalies in each grid box are averaged together to give an annual average anomaly for that grid box. The area-weighted averages of these annual average grid-box anomalies are then calculated for the northern hemisphere and for the southern hemisphere. The global average temperature is the arithmetic mean of the northern hemisphere average and the southern hemisphere average. The last step avoids biasing the global average to the more densely observed northern hemisphere. There are, of course, other ways to calculate the global average and each will give a slightly different answer.”

The UK Met Hadley Centre has one of the silliest IP licenses that I have ever read. Emphasis is in the original.

Restricted Data Access

The Met Office wish to monitor the use of this data and require an acknowledgement of the data source if they are used in any publication. The online application for access to the Met Office SST data includes the Met Office Agreement to be electronically accepted. Please note that the Met Office data sets are available for bona fide academic research only (sorry no undergraduates), on a per person per project basis (i.e. all members on a same project who will be using the data must individually apply for access to the data). If you wish to access the Met Office data for commercial or personal purposes, please contact the Met Office directly.

Your application for accessing the HadSST2 data will be processed within a day of receipt. Provided your application is complete and fully meets the Met Office conditions, a web account will be activated to allow you access to the HadSST2 directories via your login account from the BADC WWW Browse Archive pages.

Please read the 00README file available under /badc/ukmo-hadsst2/ directory to to guide you through the tree structure and the data directory.

http://badc.nerc.ac.uk/data/hadsst2/

Code

###############################################################################
# https://rhinohide.wordpress.com
###############################################################################
# 2010 08 10 : Create ERSST2 Data Reader Writer
###############################################################################
# http://hadobs.metoffice.com/hadsst2/data/HadSST2_1850on.nc.gz
###############################################################################

library(sp)
library(rgdal)
library(raster)
library(maps)
library(ncdf)

# define common variables
source("trb-init.R")

# define common functions
source("trb-common.R")

#----------------------------------------------------------------------------
# Download HadSST2_1850on.nc.gz
#----------------------------------------------------------------------------
# The HadSST2 data is provided for professional use only
# Please read the "Restricted Data Access"
# and license agreement before downloading
# http://badc.nerc.ac.uk/data/hadsst2/
#
if (! file.exists(hadsst2_data)) {
    cat("Please edit the file trb-display-hadsst2.R.\n")
    cat("There are links to license information.\n")
    cat(" and code to download the file\n")
}

# ----------- UNCOMMENT ME TO DOWNLOAD HADSST2 -------------#
#if (! file.exists(hadsst2_data)) {
#    hadsst2_gz <- paste(hadsst2_data, ".gz", sep="")
#    download.file(hadsst2_data_url, hadsst2_gz)
#    # windows users will need gunzip in their PATH
#    system(paste("gunzip ",hadsst2_gz, sep=""))
#}
# ----------------------------------------------------------#

# open HadSST2 file
nc <- open.ncdf("../download/HadSST2_1850on.nc")
print(nc)

# get all values
sst <- get.var.ncdf(nc, varid="sst")
n <- sst[1] # RNetCDF gets the NA value right without this
sst[sst==n] <- NA

# measure sst set
lj <- nrow(sst) # number of lon bars
li <- ncol(sst) # number of lat bands
range <- ceiling(length(sst)/(li*lj))

# SST values in array [lon,lat,month]
a <- array(sst, c(lj,li,range))

# there may be a more efficient matrix multiplication method
yrange <- floor(range/12 + 1/12) # excludes partial years
b <- array(NA, c(lj,li,yrange))
for (y in 1:yrange) {
  y0 <- y -1

  # annual mean for 12 years for each lat/lon cell of the matrix
  # FIX-ME: a single NA for the year will wipe out this cell
  stty <- sst[,,(y0*12+1)]
  for (i in 1:11) { stty <- stty + sst[,,(y0*12+i+1)] }
  b[,,y] <- stty/12

  # TO-DO: this takes the mean of month 'n' (1-12)

  # TO-DO: this takes seasonal means (may want to shift the winter season)
}

# define the bounding box for the raster
xmn <- get.var.ncdf(nc, varid="lon")[1]
xmx <- get.var.ncdf(nc, varid="lon")[lj]
ymn <- get.var.ncdf(nc, varid="lat")[1]
ymx <- get.var.ncdf(nc, varid="lat")[li]

# create a raster file
mmr <- raster(nrows=li,ncols=lj,
            xmn=xmn,xmx=xmx, ymn=ymn,ymx=ymx,
            crs="+proj=longlat +datum=WGS84")

# create a stack to hold yrange (160) layers
# can I read the SST NC file as a brick?
mmry <- stack()
for (i in 1:yrange) {
  # swap east and west longitude columns
  m <- b[,,i]
  n <- m
  for (h in 1:36) { n[,h] <- m[,37-h] }
  # write raster into a raster stack layer
  mmry <- stack(mmry, setValues(mmr,as.vector(n)),layer=i)
}

# define some dates for plotting
year0 <- 1850
year1 <- year0 + yrange -1
year00 <- year0 -1
ly <- seq(year0:year1)+year00
ly

# get the global annual means
#lm0 <- rep(NA,yrange) # unweighted mean of cells
#lm1 <- rep(NA,yrange) # unweighted mean of lats
lm2 <- rep(NA,yrange)  # weighted mean of lats
lm3 <- rep(NA,yrange)  # weighted mean of cells
for (i in 1:yrange) {
  #lm0[i] <- get_global_mean_w0(raster(mmry,layer=i))
  #lm1[i] <- get_global_mean_w1(raster(mmry,layer=i))
  lm2[i] <- get_global_mean_w2(raster(mmry,layer=i))
  lm3[i] <- get_global_mean_w3(raster(mmry,layer=i))
}

# hadsst data
h <- read.table("../../../data/hadsst.txt")
had <- h[1:160,]

# compare the mean methods
sum((lm2-had)*(lm2-had)) # 0.7808506
sum((lm3-had)*(lm3-had)) # 0.8075817

# look at various global trends
lm(lm2 ~ly)$coeff[2]
lm(lm2[(year1-year00-100):(year1-year00)] ~ly[(year1-year00-100):(year1-year00)])$coeff[2]
lm(lm2[(year1-year00-50):(year1-year00)] ~ly[(year1-year00-50):(year1-year00)])$coeff[2]
lm(lm2[(year1-year00-40):(year1-year00)] ~ly[(year1-year00-40):(year1-year00)])$coeff[2]
lm(lm2[(year1-year00-30):(year1-year00)] ~ly[(year1-year00-30):(year1-year00)])$coeff[2]
lm(lm2[(year1-year00-20):(year1-year00)] ~ly[(year1-year00-20):(year1-year00)])$coeff[2]
lm(lm2[(year1-year00-10):(year1-year00)] ~ly[(year1-year00-10):(year1-year00)])$coeff[2]

# plot the global annual mean
pfile <- paste("../img/", hadsst2_file,"-",year0,"-",year1,".png",sep="")
ptitle <- paste("HADSST2-",year0,"-",year1,sep="")
psub <- paste("Mean of the Weighted Means of Latitudes (trb-", version, ") ",
		"Baseline ",byear0,"-",byear1, sep="")
png(pfile, height=400,width=600)
#plot(lm2 ~ ly, type="l", main=ptitle, sub=psub)

# compare to hadsst (cru)
plot(had ~ ly, type="l", main=ptitle, sub=psub)
lines(lm2 ~ ly, col="blue")

# plot an Anomaly Map
y <- year1-year00 # the display year, this is last year in set
tit=paste("Global Gridded Mean Anomaly (",ly[y],
         ")\n Ocean Data: ",hadsst2_file, sep="")
subt=paste("https://rhinohide.wordpress.com (trb ver ",version,")",sep="")
img <- paste("../img/", hadsst2_file,"-",ly[y],".png",sep="")
png(img, width=600, height=350)
plot_raster_values(raster(mmry,layer=y),tit,subt)

# TO-DO write a trend routine
# for raster_t0 to raster_t1 compute trend in each cell

Results

Black is HadSST2 per CRU. Blue is mine. Differences likely due to the NAs in my annual mean method and the global mean methods.

HadSST2-1850-2009.png

HadSST2-2009.png

Advertisements
  1. 2010 August 11 at 12:19 am

    Hey Ron,

    I thought I’d throw my two cents in as far as the R code is concerned. Computing annual averages on gridded data is very numerically intensive. R is great, but being that it’s interpreted and not compiled, it will be really slow. I tried computed annual averages with code something like:

    for(i in 1:nlon){
    for(j in 1:nlat){
    for(k in 1:nyears){
    blah blah blah
    }
    }
    }

    and that just took forever. There’s a shortcut if you don’t mind even one NA present among 11 non-missing values to cause the annual average to go to NA. Here’s my code:

    rm(list = ls())

    library(RNetCDF)
    library(udunits); utInit()

    file = ‘HadSST2_1850on.nc’

    # Open the netCDF file.
    nc = open.nc(file, write = FALSE)

    # Get coordinate variables.
    lon = var.get.nc(nc, ‘lon’); nlon = length(lon)
    lat = var.get.nc(nc, ‘lat’); nlat = length(lat)

    # Get time, its units and convert into year/month.
    time = var.get.nc(nc, ‘time’); ntime = length(time)
    units = att.get.nc(nc, ‘time’, ‘units’)
    temp = utCalendar(time, units, ‘array’)
    time.steps = cbind(temp$year, temp$month)

    # Start and end dates
    #> time.steps[1,]
    #1] 1850 1
    #> time.steps[ntime,]
    #[1] 2010 7

    # Load the sst data.
    sst = var.get.nc(nc, ‘sst’)
    #> dim(sst)
    #[1] 72 36 1927

    # Close the netCDF file.
    close.nc(nc)

    # Generate the area weights.
    area = matrix(cos(pi*lat/180), nlon, nlat, byrow = TRUE)
    area = area/sum(area)

    # Find out how many years there are.
    years = unique(time.steps[,1])
    nyears = length(years)

    # Compute annual anomaly. This cheap and easy way causes any missing values to cause
    # the annual average to become NA. One could avoid this by using a nest for loop to use
    # mean(x, na.rm = T), but that will take FAR too long!

    sst.ann = apply(sst, c(1,2), function(x) filter(x, rep(1,12)/12)[seq(6,length(x),12)])

    #> dim(sst.ann)
    #[1] 161 72 36
    # should be 72 36 161. Use aperm to fix.

    sst.ann = aperm(sst.ann, c(2,3,1))

    #> dim(sst.ann)
    #[1] 72 36 161
    # fixed.

    # The rest is self-explanatory.

    sst.ann.nh = rep(NA, nyears)
    sst.ann.sh = rep(NA, nyears)

    north = lat > 0
    south = !north

    for(i in 1:nyears){
    sst.ann.nh[i] = sum(sst.ann[,north,i]*area[,north], na.rm = T)/sum(area[!is.na(sst.ann[,north,i])])
    sst.ann.sh[i] = sum(sst.ann[,south,i]*area[,south], na.rm = T)/sum(area[!is.na(sst.ann[,south,i])])
    }

    sst.ann.global = ts((sst.ann.nh + sst.ann.sh)/2, start = time.steps[1,], freq = 1)

    I didn’t see any mention about how many non-missing values must be present out of 12 to calculate a valid annual average. To take that into account, I think it would be a lot easier writing some code in C or Fortran code to compute the averages for the entire grid and calling it from a DLL in R. Combine the ease of R programming with the lightning speed of Fortran! I hope WordPress doesn’t mangle my code!

  2. 2010 August 11 at 5:02 am

    for(i in 1:nlon){
    for(j in 1:nlat){
    for(k in 1:nyears){
    blah blah blah
    }
    }
    }

    and that just took forever. There’s a shortcut if you don’t mind even one NA present among 11 non-missing values to cause the annual average to go to NA. Here’s my code:

    Yup. Those nested ‘for loops’ take forever. My current code is fast enough. It avoids the lat and lon loops, and just loops over months and years. But your apply function is likely to be faster. Both suffer from the ‘single NA tosses the whole year’ issue. (the slow piece in my code above is the lm0-lm3 global means loops).

    I see you are processing the hemispheres separately ala CRU.

    I really appreciate your 2c worth. I’m finally getting enough R under my belt that seeing different approaches to the same problem is valuable. Learning R is learning how to stop thinking in data streams and start thinking in matrices.

    ——

    Congrats on your paper!

  3. cce
    2010 August 11 at 9:15 am

    What is the possibility of you guys reprocessing the ICOADS data? With all due respect to the land surface record, the ocean data is where all the action should be. (Not that it would be a trivial undertaking)

    I would think that you could do interesting experiments where you take measurments near the coasts from only one type of source (i.e. different countries, and different methods) and compare it to the nearest land stations. Or do comparisons ala Zeke’s UHI experiments from two different sources where they overlap. Lots of different combinations that you could do to find step changes and spurious trends.

  4. 2010 August 11 at 12:48 pm

    I haven’t looked at it yet, but I plan on it. When we combine stations in GHCN and GSOD, those are stationary. I’m not sure how ICOADS is coded but many sea ‘stations’ positions are a function of time. Conceptionally, it seems feasible but with that added complexity. Will look into it later.

  5. cce
    2010 August 11 at 3:31 pm

    One exercise that shouldn’t be too difficult would be to produce two time series, one for land, one for ocean, from the grid boxes where they overlap. They should be nearly the same, with perhaps slightly faster warming over land. The step changes should pop right out. This is similar to what the Thompson 2008 paper did.

    You show the grid boxes where they overlap below (ERRST) so I would think it would be pretty easy to pull numbers out of htem:
    https://rhinohide.wordpress.com/2010/07/25/trb-0-09-planet-ocean/

    You could do the same for the various ocean datasets; find the difference between the land and ocean, and compare the residuals.

  6. steven Mosher
    2010 August 12 at 3:24 pm

    hey guys I just finished rewriting the whole land analysis in R without a single loop.

    used the zoo package for time series work ( the package author has been HUGE help )
    and raster of course for the geometry.

    Some major speed ups.

    Hope to take a look at the ocean stuff soon.

    If you have questions all the package maintainers hang out on R help list and they write wicked fast code. So basically I put in a question and start to write my own version and before I finish I have a working sample from the help list.

  7. steven Mosher
    2010 August 12 at 3:37 pm

    Ron, it took me quite a while to unlearn the loop. I still struggle.
    Nick Stokes and the other guys who do matrix math in their heads also have a huge advantage.

    BTW, I’d recommend the zoo package to you. If you are using R Time series class ts() then Zoo is a great way to go. Especially since we have time series with missing years in the center. There are some very slick ways to handle that issue

  8. steven Mosher
    2010 August 12 at 4:19 pm

    ron. The package uncompress will handle .gz files for you. so you can download and uncompress from within R

  9. 2010 August 12 at 5:52 pm

    Hey, Steve. *Now* I am ready to look at code.
    Could you point me to a fresh copy?

    Uncompress handles both *.Z and *.gz?
    Docs only mention and show examples for *.Z
    There is a ‘non-CRAN’ package with gzip/gunzip capablities,
    but you have to also add a bunch of other biometric packages.

    I haven’t started to work with ts series yet,
    but figured that they had a bunch of applicable methods.

    Also, check out the clim.pact package.

    I just wanted to work out some of the basics before I leaned on prebuilt tools. Part of that learning-by-doing thing.

    And which mailing list are you hanging out on? Not the r-sig-geo I take it.

  10. steven Mosher
    2010 August 14 at 3:32 am

    for gz? Lemme check again, R.utils also has good stuff

    WRT code. The last release was 2.0, but I’ve rewritten the entire thing, so now I’m putting
    old stuf back into it

    Basically I restructured everything using zoo. ha, I can even combine all duplicates with one line
    of R now.

    I’m using Robjects to say the various stages of analysis, so you can just load an object you’ve created before.

    I’ll Give you a dump and a full user guide in a couple day… Sans graphics

    The way I plan to do graphics is again using Rdata files. The grapgics code is completely separated from the analysis code.

    I feel EXACTLY the same way about the prebuilt tools. I wanted to do the whole thing in the base package before I branched out.

    I do hang out on the R sig.. but dont ask just read. The main mailing list is my hang out

    http://www.r-project.org/mail.html

    the mac list gets 1 mail per month

    r sig maybe 3-10 a day . good guys there

    main list is 50-100 per day. You can see my stupid questions.

    hey one guy solved the calculate anonomaly problem with a linear model. very slick

    So, I clean up what I got, and write a user manual.. say 3-4 days.

  11. steven Mosher
    2010 August 14 at 11:45 am

    R.utils

    for unzipping .gz

    my install uses the package for other things as well

  12. Steven Mosher
    2010 August 17 at 3:17 pm

    http://stevemosher.wordpress.com

    There you go Ron.

    from v2mean raw to the anomalies for all stations in three easy steps. no loops

    Start at moshtemp101 post and just follow it through.

    or you can start at moshtemp105. Requires v2.mean in your working directory.

  13. Steven Mosher
    2010 August 22 at 5:42 pm

    they have the wrong value set for metadata for missing values. they set 1e30 in the metadata
    and the file has 1e-30

  14. 2010 August 22 at 8:33 pm

    Strange that RNetCDF picked up the right value.
    I thought it was a prob in the R packages.
    From my data reader …

    # open HadSST2 file
    nc <- open.ncdf("../download/HadSST2_1850on.nc")
    print(nc)
    
    # get all values
    sst <- get.var.ncdf(nc, varid="sst")
    n <- sst[1] # RNetCDF gets the NA value right without this
    sst[sst==n] <- NA
  15. Steven Mosher
    2010 August 24 at 10:33 am

    I use ncdf not RNetCDF.

    Basically if you look at the metadata, they set $missingvalue = 1e+30
    The file has -1e+30

    So, to make it work, I just opened the file and reset the metadata to -1e+30. Then it read the file in
    correctly with NAs. hmm.

    In the process, the guy doing the raster package has done some updates. Basically, you can
    ( if you use ncdf) read the whole file into a raster brick. raster in the last couple months has moved from RnetCDF to ncdf. see cran for ncdf.

    getting it into a brick is then like so.

    SST<-file("hadsst.nc", varname='sst')

    nlayers(SST)

    I dont think it will be much faster since raster is in R, but it looks to be cleaner.

    you can also extract the time from the z values without using udunits: I'll hunt around for that command

    Also, he added some functionaiity for us which you can see by looking at the R Forge project.

    just look at the change log and you can see what he added ( doing cellStats on a brick, by layer)

  16. Steven Mosher
    2010 August 24 at 11:47 am

    correction

    seafile<-"HadSST2_1850on.nc"
    SST<-brick(seafile,varname='sst')

  1. 2010 August 22 at 10:35 pm
  2. 2010 August 30 at 9:46 am
  3. 2011 February 25 at 8:14 am
Comments are closed.