Inconsistent results in R with RNetCDF - why?
Posted
by
sarcozona
on Stack Overflow
See other posts from Stack Overflow
or by sarcozona
Published on 2011-06-20T14:56:57Z
Indexed on
2011/06/23
0:22 UTC
Read the original article
Hit count: 171
I am having trouble extracting data from NetCDF data files using RNetCDF. The data files each have 3 dimensions (longitude, latitude, and a date) and 3 variables (latitude, longitude, and a climate variable). There are four datasets, each with a different climate variable.
Here is some of the output from print.nc(p8m.tmax) for clarity. The other datasets are identical except for the specific climate variable.
dimensions:
month = UNLIMITED ; // (1368 currently)
lat = 3105 ;
lon = 7025 ;
variables:
float lat(lat) ;
lat:long_name = "latitude" ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
float lon(lon) ;
lon:long_name = "longitude" ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
short tmax(lon, lat, month) ;
tmax:missing_value = -9999 ;
tmax:_FillValue = -9999 ;
tmax:units = "degree_celsius" ;
tmax:scale_factor = 0.01 ;
tmax:valid_min = -5000 ;
tmax:valid_max = 6000 ;
I am getting behavior I don't understand when I use the var.get.nc function from the RNetCDF package.
For example, when I attempt to extract 82 values beginning at stval from the maximum temperature data (p8m.tmax <- open.nc(tmaxdataset.nc)) with
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
(where lon_val and lat_val specify the location in the dataset of the coordinates I'm interested in and stval is stval is set to which(time_vec==200201), which in this case equaled 1285.) I get Error: Invalid argument
But after successfully extracting 80 and 81 values
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,80))
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,81))
the command with 82 works:
> var.get.nc(p8m.tmax,'tmax', start=c(lon_val, lat_val, stval),count=c(1,1,82))
[1] 444 866 1063 ... [output snipped]
The same problem occurs in the identically structured tmin file, but at 36 instead of 82:
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval),count=c(1,1,36))
produces Error: Invalid argument
But after repeating with counts of 30, 31, etc
> var.get.nc(p8m.tmin,'tmin', start=c(lon_val, lat_val, stval), count=c(1,1,36))
works.
These examples make it seem like the function is failing at the last count, but that actually isn't the case. In the first example, var.get.nc gave Error: Invalid argument after I asked for 84 values. I then narrowed the failure down to the 82nd count by varying the starting point in the dataset and asking for only 1 value at a time. The particular number the problem occurs at also varies. I can close and reopen the dataset and have the problem occur at a different location.
In the particular examples above, lon_val and lat_val are 1595 and 1751, respectively, identifying the location in the dataset along the lat and lon dimensions for the latitude and longitude I'm interested in. The 1595th latitude and 1751th longitude are not the problem, however. The problem occurs with all other latitude and longitudes I've tried.
Varying the starting location in the dataset along the climate variable dimension (stval) and/or specifying it different (as a number in the command instead of the object stval) also does not fix the problem.
This problem doesn't always occur. I can run identical code three times in a row (clearing all objects in between runs) and get a different outcome each time. The first run may choke on the 7th entry I'm trying to get, the second might work fine, and the third run might choke on the 83rd entry. I'm absolutely baffled by such inconsistent behavior.
The open.nc function has also started to fail with the same Error: Invalid argument. Like the var.get.nc problems, it also occurs inconsistently.
Does anyone know what causes the initial failure to extract the variable? And how I might prevent it? Could have to do with the size of the data files (~60GB each) and/or the fact that I'm accessing them through networked drives?
This was also asked here: https://stat.ethz.ch/pipermail/r-help/2011-June/281233.html
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape_0.8.4 plyr_1.5.2 RNetCDF_1.5.2-2
loaded via a namespace (and not attached):
[1] tools_2.13.0
© Stack Overflow or respective owner