Home > dmsp, GIStemp > DMSP: Unfinished Business

DMSP: Unfinished Business

2010 August 30


Building my own gridded temperature anomaly code introduced me to the R “raster” package. One of the things that both I and Mosher realized pretty quickly is that much of the GHCN metadata work I had been doing this spring could be refactored in R. The original work was done with a mix of original and extended java classes developed from netcdf-java as well as a Linux version of the gdal toolkit. With Peter O’Neill quietly asking me some questions about those results, it’s well past time to take a another look.


The R raster code was ridiculously easy to implement. What originally took a few weeks of creating new java classes, coding some station objects, writing some wrapper scripts, and testing different tiff files in my free time took less than an hour to reproduce in R.

The first thing to look at is whether the new code reproduces the same results as the old code. Here I pull a list of brightness values off the GISS-style metadata file I built back in May and compare it to a list of brightness values using the R raster code. This DMSP file is “world_stable_lights.tif”


# http://data.giss.nasa.gov/gistemp/sources/GISTEMP_sources.tar.gz
# http://rhinohide.org/rhinohide.cx/co2/ghcn-java/results/v2.giss.inv-latest.txt
# http://ngdc.noaa.gov/dmsp/data/web_data/stable_lights (world_stable_lights.tar)
# http://www.ngdc.noaa.gov/dmsp/download_rad_cal_96-97.html (rad_cal.tar)

# get table with coordinates
invcols <-c("id","name","lat","lon","alt","elev","ru","pop","terr","arid",
invwids <- c(11,32,6,8,5,5,1,5,2,2,2,2,1,2,16,1,1,5)
giss_inv <- read.fwf("../giss/data/v2.inv", widths=invwids,col.names=invcols,as.is=T)
my_inv <- read.fwf("v2.giss.inv-latest.txt", widths=invwids,col.names=invcols,as.is=T)

# read radiance values
radcal <- raster("../dmsp/data/world_stable_lights.tif")

# for each coord, get radiance
radiance <- xyValues(radcal,cbind(my_inv$lon,my_inv$lat))

# compare the old with the new
sd(my_inv$bright - radiance, na.rm=T) # 0
sd(giss_inv$bright - radiance, na.rm=T) # 29.25

So far so good. The R code exactly reproduces my earlier GeoTiffDataReader class. No feat of genius, to be sure, but I nice confirmation nevertheless. But the values from the “world_stable_lights” have a poor match rate with the GISS brightness values. So why am I using “world_stable_lights” for this field? That choice actually arose because I was looking for the imagery that would best let me reproduce the A/B/C brightness categories in GHCN. And I struggled a bit in my last attempt to investigate the relation between the brightness index and the ABC brightness categories. One problem is that the index is developed by GISS and the categories by GHCN and they are not clearly distinct and overlap as a quartile display will clearly show. I forgo that for the moment as a distraction but will come back to it in a later post. Today I’m concentrating on the GISS brightness.

If we want the best match for the GISS brightness index, Hansen 2010 tells us where to look:

We use a nightlight radiance data set [Imhoff et al., 1997] that is publicly available
[http://www.ngdc.noaa.gov/dmsp/download_rad_cal_96-97.html] (measurements made between
March 1996 and February 1997)] at a resolution of 30″ × 30″ (0.0083°×0.0083°), which is a
linear scale of about 1 km. In Figure 1(a) the global radiances are shown after averaging to 0.5°
× 0.5° resolution, i.e., a pixel (resolution element) in Figure 1(a) is an average over 3600 high
resolution pixels. This averaging reduces the maximum radiance from about 3000 μW/m2/sr/μm
to about 670 μW/m2/sr/μm.

There are two ‘tif’ files included in the referenced tar ball, one with 0.008333 and one with half that at 0.01666 deg. This is convenient. I didn’t notice the low res version earlier and used the R raster ‘aggregate’ function to decrease the resolution (aggregrate neighboring cells into one) during some alternate investigations into the data. It took about 4 hours or so. Far less efficient than doing the same function with gdal.

So taking a quick look at the ‘fit’ of the GISS brightness field with both the hires and lowres versions. Same basic code as above.

radcal_hi = raster("../dmsp/data/world_avg.tif")
rad_hi = xyValues(radcal_hi,cbind(giss_inv$lon,giss_inv$lat))
sd(giss_inv$bright - rad_hi, na.rm=T)
# [1] 13.08520
radcal_lo = raster("../dmsp/data/world_avg_low_res.tif")
rad_lo = xyValues(radcal_lo,cbind(giss_inv$lon,giss_inv$lat))
sd(giss_inv$bright - rad_lo, na.rm=T)
# [1] 12.59317

So we get a slightly better fit with the low res version.

Is it good enough? Let’s look. The defining parameter for the rural/urban split given in Hansen 2010 is 32 μW/m2/sr/μm. This is the radiance value. The tif file is scaled to a ‘DN’ also known as a ‘digital number.’ The readme in the NOAA radiance calibration tarball gives the formula for the the conversion.

world_avg – The final Radiance Calibrated world product. Each pixel
contains a DN value between 0 and 255. The values between 0
and 254 can be converted to radiance values with the equation
Radiance = DN^(3/2) * 10^(-10) watts/cm^2/sr.
The value 255 signifies that no unsaturated pixels were found
over that location for the duration of the reduced gain experiment.

32 = DN^(3/2)
32^(2/3) = DN
10 ~= DN

So the ‘brightness’ number (DN) listed in the world_avg.tif which marks the break between rural and urban (and peri-urban) for GISTEMP is roughly ’10.’

I create two masks. One for GISTEMP brightness above and below ’10’. The other for the radiance DN read from the radiance calibrated tif file. Then I look for the stations that don’t fit both dark or both bright categories. I use the low resolution file.

radcal_lo = raster("../dmsp/data/world_avg_low_res.tif")
#rad_lo = xyValues(radcal_lo,cbind(giss_inv$lon,giss_inv$lat))
rad_lo = xyValues(radcal_lo,cbind(giss_inv$lon-(5*d),giss_inv$lat+(2*d)))
sd(giss_inv$bright - rad_lo, na.rm=T)
mask_a  = giss_inv$bright > 10
mask_b  = rad_lo > 10
stns_b1 = giss_inv[(mask_a) & ! mask_b,] # urban in giss, rural in tif
stns_b2 = giss_inv[(!mask_a) & mask_b,] # rural in giss, urban in tif
c <- nrow(giss_inv)
# [1] 7364
a <- nrow(stns_b1)
# [1] 1536
b <- nrow(stns_b2)
# [1] 156
1 - (a+b)/c
# 0.7702336

The above code snippet basically takes the GISS v2.inv file and selects two categories: stations that are urban in GISTEMP but are rural in the world_avg_low_res and stations that are rural in GISTEMP but are urban in the world_avg_low_res. The fit is only fair. A 77% match rate for brightness seems pretty low if we are using the same coordinates (v2.inv) and the same data files (world_avg.tif).

Can we do better? Recalling my concerns when I last visited this data, I start searching nearby cell-offsets for a better fit.

Starting with the low resolution file and an offset that is one half cell width, I begin a manual search. My best fit is below.

d = .01666 / 2
rad_lo = xyValues(radcal_lo,cbind(giss_inv$lon-(5*d),giss_inv$lat+(2*d)))
sd(giss_inv$bright - rad_lo, na.rm=T)
# 8.965257
mask_a  = giss_inv$bright > 10
mask_b  = rad_lo > 10
stns_b1 = giss_inv[(mask_a) & ! mask_b,] # urban in giss, rural in tif
stns_b2 = giss_inv[(!mask_a) & mask_b,] # rural in giss, urban in tif
c <- nrow(giss_inv)
# [1] 7364
a <- nrow(stns_b1)
# [1] 760
b <- nrow(stns_b2)
# [1] 218
1 - (a+b)/c
# 0.8671917

Moving to the hi res file …

d = .008333 / 2
rad_hi = xyValues(radcal_hi,cbind(giss_inv$lon-(11*d),giss_inv$lat+(5*d)))
sd(giss_inv$bright - rad_hi, na.rm=T)
# 8.795083
mask_a  = giss_inv$bright > 10
mask_b  = rad_hi > 10
stns_b1 = giss_inv[(mask_a) & ! mask_b,] # urban in giss, rural in tif
stns_b2 = giss_inv[(!mask_a) & mask_b,] # rural in giss, urban in tif
c <- nrow(giss_inv)
# [1] 7364
a <- nrow(stns_b1)
# [1] 746
b <- nrow(stns_b2)
# [1] 224
1 - (a+b)/c
# 0.8682781

We can look at the histogram of the difference between GISTEMP and the DMSP/RC lookup. Here I use the hi-res file.


Similarly, a scatter plot using GISTEMP brightness and the DMSP/RC lookup data.



So what have I shown?

You cannot extract the GISS Brightness values straight from the world_avg.tif or world_avg_low_res.tif files that Hansen 2010 points to in the descriptive GISTEMP paper.

The ‘fit’ as defined both by standard deviation and by categorization +/- 32 10^(-10) watts/cm^2/sr 32 μW/m2/sr/μm is better when the lookup uses a cell that is shifted to the west (lon – 0.04165) and north (lat + 0.01666). . This might be indicative of a coordinate error in the DMSP/RC tif file or in the algorithms used by GISS. It might also be an artificat stemming from the use of a resolution change to the DMSP/RC at GISS. I dunno.

As a note, GISTEMP uses a threshold of DN>10 for periurban and DN>35 for urban as defined in the text_to_binary.f file

If I was using a data source different from GISS to calculate this value, I might consider my 86% match rate to be sufficient and declare victory. But 86% is not good enough when I am supposedly using the same data for input as GISS. Look for more on this.

Oh. And the R raster package makes the code for this ridiculously easy.

I leave you with some related reading on why a ‘radiative calibrated’ data set exists …

Imhoff et al. (1997) investigated how well DMSP-OLS defined urban areas in the US correspond
with census data. Using an iterative thresholding technique, they established a threshold of 89%
for the 1994-95 city lights data set. This threshold was identified as being the level where, on
average, on could shrink the city-lights but preserve the integrity of the lit urban cluster (i.e. no
internal fragmentation within the cluster). This was detected via analysis of the perimeter vs. area
of clusters, with a sudden jump in perimeter length indicative of fragmentation. When overlaid
with US Census bureau data, there was no significant difference between the two areal
assessments. This approach was further exploited by Small et al. (2005) who used this technique
to investigate more general corrections for the overglow phenomenon associated with DMSP-
OLS data (see section 3.3.1). The Imhoff et al. (1997) analysis applies only to the conterminous
United States and it is unknown how this threshold performs in other parts of the world.

Thematic Guide on Night-time light Satellite Data

Previous analyses have revealed a consistent disparity
between various spatial measures of urban extent and the
spatial extent of lighted areas in the night lights datasets
(Welch, 1980; Welch & Zupko, 1980). Specifically, the
lighted areas detected by the OLS are consistently larger
than the geographic extents of the settlements they are
associated with. The larger spatial extent of lighted area,
relative to developed land area, is sometimes referred to as
‘‘blooming’’. The blooming is the result of three primary
phenomena, including: (1) the relatively coarse spatial
resolution of the OLS sensor and the detection of diffuse
and scattered light over areas containing no light source , (2)
large overlap in the footprints of adjacent OLS pixels, and
(3) the accumulation of geolocation errors in the composi-
ting process (Elvidge et al., 2004). In the context of this
study, blooming refers to spurious indication of light in a
location that does not contain a light source.

In order to offset the blooming effect, Imhoff et al.
(1997) proposed using a threshold of 89% detection
frequency to eliminate less frequently detected lighted
pixels at the peripheries of large urban areas. Imposing a
detection frequency threshold effectively shrinks the
lighted areas so they are more consistent with admin-
istrative definitions of urban extent. The drawback
associated with detection frequency thresholds is that they
also attenuate large numbers of smaller, less frequently
detected settlements. The 89% threshold proposed by
Imhoff et al. (1997) was derived from an average of
85%, 89% and 94% thresholds determined for the cities of
Chicago, Sacramento and Miami, respectively. At the state
level, the 89% threshold significantly increased the
correlation (from 0.87 to 0.97) between lighted area and
US. Census-defined urban areas, despite the numerous
caveats of using administrative boundaries that were
discussed by Imhoff et al. (1997). In a subsequent analysis,
Sutton et al. (2001) obtained a correlation of 0.68 between
Ln(lighted area) and Ln(population) when using a threshold
of 80%. These authors recognized the limitations of using a
single threshold for a global analysis and also used a
combination of 40%, 80% and 90% thresholds for a global
analysis of lighted area and aggregate population (Sutton et
al., 2001). In another recent study (Henderson et al., 2003),
he authors were able to avoid some of the confounding
factors associated with administrative (cadastral) delinea-
tions of urban extent by comparing lighted area with urban
boundaries derived from Landsat TM imagery. Using
Supervised Maximum Likelihood classifications of Beijing,
Lhasa and San Francisco, these authors obtained optimum
thresholds of 97%, 88% and 92%, respectively. They also
found that these thresholds resulted in significant lateral
shifts between the lighted area and the Landsat-derived
urban boundary. These studies all suggest that it may be
possible to obtain reasonable agreement between lighted
area and various measures of city size but these studies also
reveal significant variability in the relationship between
lighted area and different definitions of urban extent. All of
these studies have emphasized the need for more extensive
analyses of lighted area, detection thresholds and urban

Spatial analysis of global urban extent from DMSP-OLS night lights

  1. 2010 August 30 at 4:47 pm

    Marc Imhoff is out of office.
    But I’m trying! 😀

  2. 2010 August 30 at 6:27 pm

    Well, in fact I think you can, except for the two South Pole stations – but only if your computer matches the the floating point truncation/rounding behaviour of the GISS computer originally used to find these brightness values. So quite possibly even GISS can no longer reproduce these values unless they have kept that particular machine.

    Using the world_avg.tif file I have been able to match the GISS brightness for all but 121 of the 7364 stations in the GISS v2.inv file. Two of these 121 stations are the two South Pole stations, AMUNDSEN-SCOT and Clean_Air, for which the tif file provides a somewhat unlikely brightness value of 8, while the GISS file gives a more likely zero value. For the 119 stations where I obtained a different brightness value I found the the pixel either immediately above or immediately below the one I had used in the tif file gave the GISS brightness value. Finally, I checked the row value calculated, before rounding, for each of these 119 stations, and found that each calculated tif file row had a fractional part of 0.5, so that the rounded value will be determined by the truncation/rounding behaviour of the floating point calculations.

    On a related topic, I have corrected the latitude/longitude values for some of the airport stations in v2.inv, using a more reliable source for these values and verifying the location of the airpost using Google Earth, and then correcting the brightness values using world_avg.tif file. I intend to correct all the airport stations in this way, where possible (so far I’ve found a few cases where there are a couple of nearby airports as possible candidates, or where the airport no longer exists, although even then the location is still possibly visible in Google Earth as a vegetation change. I’ve also corrected some of the blunders such as the use of latitude E/W of Madrid instead of Greenwich and stations out to sea, as well as a few cases where I knew the location of the actual station with greater precision rather than just that it is “at the airport”. Finally, Reto Ruedy let me know that 64308160000, described as TENKODOGO, with coordinates in Burkina Faso, is actually Zaragoza airport, so I’ve corrected that too.

    When I’ve corrected sufficient stations to constitute a reasonable sized sample I intend to write a post on the distribution of these location errors, and when I’ve worked my way through all the airport stations (I’ve checked about 300 so far) I will make the corrected metadata file available. If you would like an incomplete “preview” version before then just let me know.

  3. 2010 August 30 at 6:30 pm

    My post just above should have started by quoting you:

    “You cannot extract the GISS Brightness values straight from the world_avg.tif or world_avg_low_res.tif files that Hansen 2010 points to in the descriptive GISTEMP paper.”

    but unfortunately I got the tags wrong.

  4. 2010 August 30 at 9:17 pm

    Using the world_avg.tif file I have been able to match the GISS brightness for all but 121 of the 7364 stations in the GISS v2.inv file.

    This is based on just controlling precision in the cell calculation?

  5. Steven Mosher
    2010 September 13 at 9:38 am


    Ron, I’ve been working with the raster package maintainer on some new routines and CRAN is botching up the builds, so I figured I would circle back and have a look at using raster for the metadata. Glad to find that you are running point ahead of me since GIS is not my thing. my goal would be to create a raster STACK of metadata since the data comes in various resolutions. Looks like peter is also doing cool work.
    I’ll look at your code and see if there is anything we can add to the raster package to make the job easier.

    Note you can write out bricks as netcdf now, the documentation doesnt reflect this but it works and is tested

    look for some major improvements in 1.5.4

  6. Steven Mosher
    2010 September 13 at 12:10 pm



    when I download and untar that and look at the lores.tif I get all zeros

  7. Steven Mosher
    2010 October 14 at 1:40 am

    I’ve got some new stuff on nightlights.. looks like its shifted from the truth ever so slightly, i did some comparisons with Gmaps kinda hard to explain you have see it.

    i’ll talk to peter and see what he thinks

  1. 2010 October 18 at 1:03 am
Comments are closed.