Reading URBS files into R

URBS (Unified River Basin Simulator) is a hydrologic modelling platform that combines rainfall-runoff modelling with runoff-routing.  URBS is commonly used for operational hydrology to provide short term flow forecasting for flood warnings.

I want to look at and analyse some output from URBS, and of course, I’d like to do it in R.  Getting the data into R is a challenge.  URBS files have several different types of data stacked on top of each other.  The number of columns varies and for the files I have, the dates are in excel format.  A example is shown below.  The first 204 lines of the file are hourly gross rain, followed by effective rain, then river level and flow rate.  Although ‘Gross Rain’ is the heading of a column, the data within that column are hourly time stamps that use excel serial dates.

1

Gross Rain

River station 1

River station 2

2

41062.42

0.78

1.2

205

Effective Rain

River station 1

River station 2

206

41062.42

0

0

409

River level

River station 1 (modelled)

River station 1
(measured)

River station 2
(modelled)

410

41062.375

1.31

1.30

0.68

614

Flow Rate

River station 1 (modelled)

River station 1
(measured)

River station 2
(modelled)

The following function reads in an URBS file in CSV format and returns the various segments as data frames stored in a list.

Key aspects include:

  • The file is read in a line at a time using readLines
  • grep is used to find lines that contain non-numeric characters, these represent the start of a data segment.
  • The number of last line of the file is found using countLines from the R.utils package
  • The type of data contained in a segment is read from the heading of that segment e.g. ‘Gross Rain’, ‘River Level’ etc.
  • Headings for each segment are read in and processed separately from the data for each segment.  This is a safer approach when the number of headings and columns may differ
  • Column headings are cleaned up using gsub to remove repeated and trailing periods
  • Columns that contain only missing values, a problem that seems to occur with these types of files, are deleted
  • Excel dates e.g. 41070.8333 are converted to POSIXct dates e.g. 2012-06-10 19:59:57
  • Both -99 and -99.0 are used to designate missing data

Note that the date conversion uses the UTC time zone.  This may need to be changed.

library(R.utils)
ParseURBSfile <- function(file.name) {

  # Args
  #  file.name = name (and path) of the URBS file in csv format

  # Value
  #  A list with a data frame for each type of data in the URBS file

  starts <- grep('[:alpha:]', readLines(file.name)) # line numbers of headings
  ends <- as.vector(c(starts[-1]-1, countLines(file.name))) # end of each segement

  n <- length(starts) # number of segments in the file
  urbs.data <- vector(mode = 'list', length = n) # blank list to hold output

  # function to process the headings
  Read.heading <- function(file.name, line.heading){

    # function to read in the headings for a segment of the URBS file
    #
    # Args
    #    file.name = name of file
    #    line.heading = line number of the heading within the URBS file

    heading <- read.csv(file.name, skip=line.heading-1, header=FALSE, nrows=1, 
                  colClasses='character' )
    heading <- make.names(heading)
    heading <- gsub('[.]+','.', heading) # replace multiple '.'  with single '.'
    heading <- gsub('.$','', heading) # remove trailing  '.'
    data.type <- heading[1] # type of data in this segment
    heading[1] <- 'date' # first column contains date/time information

    return(list(heading=heading, data.type=data.type))
  }

  for(i in 1:n){ # loop through all segments
    segment <- Read.heading(file.name, starts[i])
    my.data <- read.csv(file.name, skip=starts[i], nrows=ends[i]-starts[i],
                  na.strings =  c("-99.0","-99"), header=FALSE)
    my.data[my.data == -99] <- NA # get rid of any -99 values that slip through

    names(my.data) <- segment$heading

    # Delete any columns that only contain missing data
    if(any(colSums(is.na(my.data)) == nrow(my.data))) 
           my.data <- my.data[-which(colSums(is.na(my.data)) == nrow(my.data))]

    # Convert from excel serial date
     my.data$date <- as.POSIXct(my.data$date*60*60*24, 
         origin="1899-12-30 00:00:00", tz='UTC')

    urbs.data[[i]] <- my.data
    names(urbs.data)[i] <- segment$data.type
  }

  return(urbs.data)

}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s