Quality codes in hydrologic data – II

I’ve previously written about quality codes and some approaches to summarising the quality codes in a hydrologic data set.  This post provides some additional approaches to producing graphical summaries of codes.  These figures summarise around 1.8 million flow records (12 min data from 1975 to the present).  R code to produce these plots is available in a gist.

Box plot

Legend placed inside the plot, filled boxes, superscripts in axis title.

Qcode-Boxplot

Figure 1: Boxplots of discharges associated with quality codes

Time series coloured by quality code

Time series faceted by year.  Points in legend are larger than used for graphing to aid interpretation.

Code-facettedTS

Figure 2: Time series coloured by quality code

Time series showing occurrence of a single code

For this data, quality code 120 means “Estimated data not using correlation (ave. dry weather hydrograph used) or Rating extrapolated”.

Code-single

Figure 3: Occurrence of a single quality code in a time series

Single code, single year

Code-singlecode-singleyear

Figure 4: Occurrence of a single quality code in a choosen year

Codes for selected years and months

Legend placed on plot and made transparent

Qcode-selectedmonths

Figure 5: Discharge coloured by quality code for a chosen period

Missing data

Missing values don’t appear on plots.  Here I’ve recoded missing data as -50 so it will show up as points below the flow data.  Check the facet for 1975 which shows several missing values.

Qcode-missing.jpeg

Figure 6: Missing data plotted below discharges

The codes used for missing data can be summarised easily using commands in the dplyr package.

yarra %>%
filter(is.na(discharge_229143A)) %>%
count(discharge_229143A_qcode)

In this case, three codes have been used.

Code Number Meaning
151 1127 Poor or unverified data. Use with caution.
Data lost due to natural causes
180 712 Equipment malfunction
255 16747 Gap

Tile plot

For this plot, there is one tile for each 12 min record.  The tile colour shows the quality code.

QCode-Tile

Figure 7: Time stamp of quality codes as a tile plot

Treemap

Area shows the frequency of codes.

QCode-treemap

Figure 8: Frequency of quality codes as a treemap

Summary of all missing values 

Summary data for quality codes can be formatted and pasted to the clipboard on a Mac.  Flow data is in a dataframe called yarra.  Quality codes are in a column called discharge_229143A_qcode

# Quality codes
# Make a table of the different codes that are used
qcode_table <- yarra %>%
count(discharge_229143A_qcode) %>%
mutate(pc = 100 * n/sum(n)) %>%
mutate(pc = round(pc, 2)) %>%
mutate(n = format(n, big.mark = ',')) %>%
rename(`Proportion (%)` = pc) %>%
rename(`Quality code` = discharge_229143A_qcode) %>%
rename('Number' = n)

# write to clipboard
clip <- pipe("pbcopy", "w")
write.table(qcode_table, file=clip, sep = '\t', row.names = FALSE, quote = FALSE)
close(clip)

Qcode-table

All code is available as a gist.

References

Bureau of Meteorology (2015) Streamflow quality codes for hydrologic reference stations  (link to page at Wayback Machine) (link to page at Bureau).

Sinclair Knight Merz (2010) Developing guidelines for the selection of streamflow gauging stations. (link to page at Wayback Machine) (link to page at Bureau).  Appendix A of this report lists quality codes as used in many jurisdictions in Australia.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s