Reading data supplied from Hydstra

(Updated 16 Nov, 2015 to include a function that takes the name of a Hydstra file and returns a data frame. See the end of the post.)

Hydstra (previously Hydsys) is a widely used computer system for storing hydrologic data. Hydstra can export csv files and these are commonly supplied if hydrologic and water quality data are requested from agencies.

The files come with a multiline header and this needs to be summarised as a short column name for use in R.  The first few lines of an example Hydstra file are shown below. Data have been read in with `header = FALSE’, so columns are labelled V1, V2, V3,..  MIssing data is labelled NA.

There is information on  date/time in the first column.  The second column contains instantaneous values of water level data for gauge 229149A, while the third column contains the quality code for the water level data. The first row is the gauge number (e.g. 229149), the second row is a code that specifies observation type (e.g. 100.0), the third row is text specifying observation type and units (e.g. Turbidity in NTU), the fourth row specifies instantaneous values (Inst) or quality codes (Qual).

V1 V2 V3 V4 V5 V6 V7 V8 V9
Time 229149A NA 229149A NA 229643A NA 229643A NA
and 100 NA 810 NA 100 NA 141 NA
Date Water Level (m) NA Turbidity (NTU) NA Water Level (m) NA Discharge (Ml/d) NA
NA Inst Qual Inst Qual Inst Qual Inst Qual
2004-10-31 00:00:00 0.491 2 5.4 2 0.025 65 1.19 150
2004-10-31 00:12:00 0.49 2 5.3 2 0.026 65 1.23 150
2004-10-31 00:24:00 0.49 2 5.1 2 0.025 65 1.18 150
2004-10-31 00:36:00 0.489 2 5 2 0.023 65 1.1 150
2004-10-31 00:48:00 0.489 2 4.8 2 0.022 65 1.06 150
2004-10-31 01:00:00 0.488 2 4.7 2 0.024 65 1.14 150

Table 1: Headers and first few lines of a Hydstra csv file

We need to create column labels that use the header information which is spread across four rows. I use column labels of the form:gauge number.observation type.value type; for example X229149A.w.i, where w = Water Level and i = Inst.  The following function MakeNames_Hydstra processes the first 4 rows of the Hydstra file to create column names.  Note that each missing value in the header is equal to the value in the column its left.  This means we can use the last-observation-carried-forward approach to find the required value.  The zoo package provides the function na.locf for this purpose.

MakeNames_Hydstra <- function(Hydstra.csv){
  
  library(zoo)

  r1 <- as.vector(unlist(Hydstra.csv[1, ]))
  r1 <- na.locf(r1)
  r1 <- paste0('X', r1)
  
  r3 <- as.vector(unlist(Hydstra.csv[3, ]))
  r3 <- na.locf(r3)
  r3 <- tolower(substring(r3,1,1))

  r4 <- as.vector(unlist(Hydstra.csv[4, ]))
  r4 <- tolower(substring(r4,1,1))
  
  my.names <- paste(r1, r3, r4, sep=".")
  my.names[1] <- "datetime"
  my.names
}

  
MakeNames_Hydstra(Ex.Hydstra)[1:9] # Only print the first 9 names
## [1] "datetime"     "X229149A.w.i" "X229149A.w.q" "X229149A.t.i"
## [5] "X229149A.t.q" "X229643A.w.i" "X229643A.w.q" "X229643A.d.i"
## [9] "X229643A.d.q"

The safest way to use the function is to read the header and data information separately to ensure that the internal type of the data in R, ?typeof(), is determined by the data not the header.

  1. Read the first four rows of the Hydsta file
  2. Make the names
  3. Read the remainder of the data file, set skip = 4 and header = FALSE
  4. Assign the names
# Read the header information
library(repmis) # to read data from dropbox

Hydstra.Head <- source_data("https://dl.dropboxusercontent.com/u/10963448/Ex-Hydstra.csv", 
                      header = FALSE,
                      stingsAsFactors = FALSE,
                      na.strings= c(NA, ""),
                      nrow = 4)
# Read the data
Hydstra.data <- source_data("https://dl.dropboxusercontent.com/u/10963448/Ex-Hydstra.csv", 
                      na.strings= c(NA, ""),
                      nrow = -1, # read to the end of the file
                      skip = 4)


names(Hydstra.data) <- MakeNames_Hydstra(Hydstra.Head)
Hydstra.data[ ,1:5] # print only the first 5 columns
##              datetime X229149A.w.i X229149A.w.q X229149A.t.i X229149A.t.q
## 1 2004-10-31 00:00:00        0.491            2          5.4            2
## 2 2004-10-31 00:12:00        0.490            2          5.3            2
## 3 2004-10-31 00:24:00        0.490            2          5.1            2
## 4 2004-10-31 00:36:00        0.489            2          5.0            2
## 5 2004-10-31 00:48:00        0.489            2          4.8            2
## 6 2004-10-31 01:00:00        0.488            2          4.7            2

This is wrapped together in the function Make_df_Hystra. Provide a file name to a Hydstra file, that includes a full path, and the function will return a data frame.

 

Code is available as a gist.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s