Quality codes in hydrologic data

Hydrologic data often have have associated quality codes.  Unfortunately these codes have become complicated with different systems used by various jurisdictions and data providers. The upshot is that it can be challenging to actually make use of quality codes when analysing data. However we need to be sure that key results don’t just depend on a few questionable data points.

Examples of quality codes
Quality code Definition
1 Very good data
2 Good quality edited data, minor editing only
15 Minor editing only
50 Good or reliable data, medium editing required
151 Poor data or data yet to be verified, use with caution
240 Data lost due to reasons beyond the control of the hydrographic contractor
255 Invalid or lost data

Careful analysis requires:

  • Keeping track of the quality codes associated with data
  • Excluding data that are clearly suspect
  • Identifying any data points that have a large influence on the key results and being sure they are reasonable

It is important to make sure that results are robust to decisions about which data are included or excluded. If it is not clear whether some data points should be in or out, then it may be necessary run the analysis within and  without the suspect data. If results don’t change much we can be confident the suspect points are not highly influential. If results do change…we have more work to do; investigating the data and perhaps collecting more.

We can use R to summarise and visualise quality codes.  The functions table and prop.table will tabulate and determine the proportion of the data with each code.

A tree map can be useful to visualise the proportion of the various codes within a data set.

Treeplot of quality codes

Treeplot of quality codes

A time series of data can be coloured by quality code.

Time series of data coloured by quality code

Time series of data coloured by quality code

A tile plot may show seasonal patterns in occurrence of quality codes.

Tile plot of quality codes

Tile plot of quality codes

A boxplot may show that particular codes are associated with particular data ranges.

Boxplot of flow by quality code

Boxplot of flow by quality code

Some example R code follows.  Also available as a Gist.

library(treemap)
library(RColorBrewer)
library(ggplot2)
library(grid)
library(lubridate)

remove(list = objects())
# Generate some data

# dates
set.seed(2015)

xdate <- seq(as.Date('2000-01-01'),as.Date('2002-12-31'), by = '1 day' )

# generate some flow data, usually this would come from a stream gauging record
flow <- rlnorm(n= length(xdate), meanlog = 1, sdlog=0.5)

qcodes <- c(1, 2, 15, 50, 151, 240, 255)
qcode.seq <- sample(qcodes, length(xdate), replace = TRUE, prob = 1/qcodes)

my.data <- data.frame(xdate = xdate, flow = flow, qcode = qcode.seq)
# my.data can be replaced with a data set

# Tabulate quality codes

with(my.data, table(qcode))

#   1   2  15  50 151 255 
# 704 340  37   9   4   2 

# Percentage of data with each code
with(my.data, round(100*prop.table(table(qcode)), 2 ))
#     1     2    15    50   151   255 
# 64.23 31.02  3.38  0.82  0.36  0.18 

# Tree map of quality codes
qcode.table.df <- with(my.data, as.data.frame(table(qcode)))
treemap(qcode.table.df, index="qcode", vSize="Freq", title="", algorithm="pivotSize", palette=rev(brewer.pal(8, "RdYlGn")))


#  Tile plot showing the occurance of quality codes
# set up x-axes labels
month.start <- yday( seq(as.Date('2000-01-01'),as.Date('2000-12-31'), by = 'months' ))


pl<- ggplot(my.data,aes(yday(xdate), year(xdate)))
pl + geom_tile(aes(fill = factor(qcode))) + 
  scale_fill_manual(values = rev(brewer.pal(8, "RdYlGn")), name='Quality Code' ) +
  theme_bw() +
  theme(axis.title.x = element_text(colour="grey20", size=20, vjust=-2)) +
  theme(axis.text.x = element_text(colour="grey20",size=12)) +
  theme(axis.title.y = element_text(colour="grey20",size=20, vjust=0)) +
  theme(axis.text.y = element_text(colour="grey20",size=12)) +
  theme(legend.title = element_text(colour="grey20",size=12)) +
  theme(plot.margin = unit(c(2.5, 2.5, 2.5, 2.5), "cm")) +
  ylab("Year") +
  scale_x_continuous(name="Month", breaks=month.start, labels=month.abb)


# Time series showing the occurance of quality codes

pl <- ggplot(my.data, aes(x=xdate, y=flow, color=factor(qcode) ))
pl + geom_point() +
  scale_color_manual(values = rev(brewer.pal(8, "RdYlGn")), name = "Quality code") +
  theme_bw() +
  theme(axis.title.x = element_text(colour="grey20", size=20, vjust=-2)) +
  theme(axis.text.x = element_text(colour="grey20",size=10)) +
  theme(axis.title.y = element_text(colour="grey20",size=20, vjust=0)) +
  theme(axis.text.y = element_text(colour="grey20",size=15)) +
  theme(legend.title = element_text(colour="grey20",size=12)) +
  theme(legend.text = element_text(colour="grey20",size=12)) +
  theme(plot.margin = unit(c(2.5, 2.5, 2.5, 2.5), "cm")) +
  labs(x = "Date") +
  labs(y = "Flow")

# Box plot by quality code

pl <- ggplot(my.data, aes(x=factor(qcode), y=flow))
pl + geom_boxplot(aes(fill=factor(qcode)), outlier.size=0) +
  geom_jitter(alpha=0.3) +
  scale_fill_manual(values=rev(brewer.pal(8, "RdYlGn")), name="Quality code") +
  
  labs(x = "Quality code") +
  labs(y = "Flow") +
  
  theme_bw() +
  theme(axis.title.x = element_text(colour="grey20", size=20, vjust=-2)) +
  theme(axis.text.x = element_text(colour="grey20",size=12)) +
  theme(axis.title.y = element_text(colour="grey20",size=20, vjust=0)) +
  theme(axis.text.y = element_text(colour="grey20",size=12)) +
  theme(legend.title = element_text(colour="grey20",size=12)) +
  theme(plot.margin = unit(c(2.5, 2.5, 2.5, 2.5), "cm")) 

One thought on “Quality codes in hydrologic data

  1. Pingback: Quality codes in hydrologic data – II | tonyladson

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s