# Lead in rain tanks – Wellington, NZ

I’ve written about lead levels in rain tanks before here and in this book chapter.  This is just a quick post about some recent work in New Zealand (link).  Researchers looked at lead levels in six water tanks in the Wellington area and found that 69% of samples exceeded WHOs health-based guideline of 0.01 mg/L for lead.  This is consistent with the 32 studies that we summarised in Magyar and Ladson (2015).

### References

Stewart, C., Kim, N. D., Johnson, D. M. and Nayyerloo, M. (2016) Health hazards associated with consumption of roof-collected rainwater in urban areas in emergency situations.  Int. J. Environ. Res. Public Health 13(10):1012 doi:10.3390/ijerph13101012

Magyar, M. I. and Ladson, A. R. (2015) Chemical quality of rainwater in rain tanks. In: Sharma, A., Begbie, D. and Gardner, T.  Rainwater tank systems for urban water supply: design, yield, energy, health risks, economics and social perceptions.  IWA publishing. pp 207-224. Link to Chapter 9

# RORB: comparing modelled and measured hydrographs

[Update  15 Nov 2016: Calculations available as a spreadsheet]

RORB provides 7 statistics to summarise the match between  calculated and actual hydrographs.  A comparison of:

1. Peak discharge (m3/s)
2. Time to peak (hour)
3. Volume (m3)
4. Average absolute coordinate error (m3/s)
5. Time to centroid (hour)
6. Lag (centre of mass to centre of mass) (hour)
7. Lag to peak (hour)

An example is shown in Figure 1 below which compares a calculated and actual hydrograph for the Werribee River.  This is taken from section 10.2.1 of the RORB manual.  The difference between the hydrographs is summarised as an absolute error and a percentage error for each statistic.

During calibration, the losses and kc can be varied until the modeller is satisfied there is a reasonable match between the calculated and actual hydrographs across the range of statistics.  Generally modellers focus on peak discharge and volume but the other error measures are also useful.  For example, problems in the speed of the hydrograph (time to centroid and lag) can be related to the number of sub-catchments used in the RORB model (see the discussion here).

There is a brief summary of these statistics in Section 9.2.1.1 of the RORB manual.  In this blog I’ll discuss them in more detail.

Figure 1: Error statistics provided by RORB based on the differences between calculated and actual hydrographs

### Example data set

Example calculations are made using data from the Werribee River worked example (Section 10.2 of the RORB manual).  The actual and calculated hydrographs are listed on page 111 and 112.  See Figure 2.

Figure 2: Calculated and actual hydrographs for the Werribee worked example (p111 – 112 of the RORB manual)

### Peak discharge

The error in peak discharge is the difference between the maximum of the calculated RORB hydrograph and the maximum of the actual hydrograph provided as input to RORB.

For the Werribee example:

max(actual) = 356 m3/s

max(calculated) = 337.999 m3/s

Absolute difference = max(calculated) – max(actual) = 337.999 – 356 = – 18 m3/s

Percentage difference = 100×(max(calculated) – max(actual))/max(actual)

$\dfrac{-18}{356} \approx 5.1\%$

Note that these error measures are expressed in a the opposite way to that used by statisticians.  In statistics, residuals are usually expressed actual – calculated  e.g. $y - \hat{y}$.  Here we are reporting calculated – actual.  So if the difference is negative, the calculated value is too small.

### Time to peak

This is where it starts to get interesting.  Looking at the data in Figure 2, the peak flow for both the actual and calculated hydrographs occurs at the 9th time increment, which Figure 2 suggests, is at 18 hours.  However, look at the statistics in Figure 1; the time to peak is listed as occurring at 16 hours.  What’s going on?

We can get a clue by looking at the RORB control vector for the Werribee case study (see Figure 3).  The first line highlighted in green show that the time increment is 2 hours and calculations will be undertaken for 28 increments. The second line in green shows that the two actual hydrographs (Melton Reservoir outflow and Werribee weir) both start at the zeroth time increment and end at the 28th time increment.  The trick is that increments start at zero not 1.  The increments shown in Figure 2 start at 1 not zero.  This is an error in reporting.  The first flows listed in Figure 2 actually occur at time zero so the maximum flows occur after 16 hours, not 18 hours.  Its a bit hard to tell, but if you look closely at the graph in Figure 1, the maximum flow occurs at 16 hours not 18 hours.

Figure 3: RORB control vector from the Werribee case study (p106 of the RORB manual)

There seems to be an ‘out by one’ error in the RORB output file.  Other case studies in the RORB manual have the same issue.  This creates an inconsistency between the on-screen statistics, such as those shown in Figure 1, and the RORB text output.

### Volume

RORB estimates hydrograph volumes as the mean flow times the length of the hydrograph.

For the data shown in Figure 2, the means are:

• Calculated 119.0093 m3/s
• Actual 122.4138 m3/s

To convert these mean flows to volumes (m3) multiply by the length of the hydrograph.  In this case, there are 29 increments (remember we start at zero and finish at 28).  Each increment is 2 hours long so we need to multiply by 2 × 60 × 60 to convert hours to seconds.

• Volume of calculated hydrograph 119.0093 × 29 × 2 × 60 × 60 = 24848021 ≈ 0.25 × 108 m3
• Volume of actual hydrograph 122.4138 × 29 × 2 × 60 × 60 = 25560000 ≈ 0.26 × 108 m3

The absolute difference is 24848021 – 25560000 =  -711979 ≈ -0.71 × 106 m3

Percentage difference:

100 × (24848021 – 25560000)/25560000 ≈ -2.8%

### Average absolute coordinate error

The average absolute coordinate error is the mean of absolute values of the differences between the actual and calculated hydrograph at each time step.

$\mathrm{AACE} = \dfrac{\sum\limits_{i }^{N} \lvert a_i - c_i \rvert}{N+1}$

Where:

• AACE is the average absolute coordinate error
• $a_i$ is the flow at each $i$ time increment for the actual hydrograph
• $c_i$ is the flow at each $i$ time increment for the calculated hydrograph
• There are $N + 1$ time increments.

For the Werribee case study, the first time step  is at zero and the final is at 28 so there are 29 increments in total.

The average absolute coordinate error =

$\dfrac{\lvert 0 - 0 \rvert + \lvert 0 - 0\rvert + \lvert 8 - 8.007 \rvert + \lvert 34 - 37.806 \rvert + \ldots}{28+1} \approx 7.6 \; \mathrm{m^3 s^{-1}}$

The percentage error is the average absolute coordinate error divided by the average flow of the actual hydrograph.

For the Werribee case study, the average flow during the actual hydrograph is 122.4138 m3/s so the percentage average absolute coordinate error is:

$100 \times \dfrac{7.6}{122.4} \approx 6.2 \%$

### Time to centroid

The time to the centroid of the hydrograph can be be calculated as sum of flow × time divided by the sum of the flow

$\dfrac{\sum{}{}t_iQ_i}{\sum{}{}Q_i}$

Looking at the data in Figure 2, and correcting for the fact that the first time step is at zero, not 2 hours,  the centroid for the actual hydrograph will be:

$\frac{0 \times 0 + 2 \times 0 + 4 \times 8 + 6\times 34 + \ldots}{0+0+8+35+\ldots}$

Time to centroid (actual) = 24.11662 hour

Time to centroid (calculated) = 23.44448 hour

Absolute error = 23.4 4- 24.11 = -0.672 ≈-0.7 hour

Percentage error in the time to centroid is the absolute error divided by the time to centroid of the actual hydrograph.

$\dfrac{-0.672}{24.117} \approx -2.8\%$

Lag (centre of mass to centre of mass)

The RORB manual (p101) defines the lag (c.m to c.m) as:

the time from the centroid of concentrated inputs to the channels upstream of the point concerned, including both sub-area inflows and concentrated channel inflow hydrographs, to the time of centroid of the hydrograph.

For the Werribee worked example, the concentrated input is the hydrograph at Melton Reservoir.  This is listed in Figure 3 (the two lines under the green box):

0, 0, 66, 150, 253, …

The time to the centroid of this hydrograph can be calculated using the same procedure as before.

Time to centroid of the input hydrograph at Melton Reservoir is 19.90657  hour.

The difference in time between the centre of mass of the input hydrograph and that of the actual hydrograph is:

24.111662 – 19.90657 = 4.21 ≈ 4.2 hour

The difference in time between the centre of mass of the input hydrograph and that of the calculated hydrograph is:

23.44448 – 19.90657 = 3.53823 ≈ 3.5379 hour

The absolute difference between these values is:

3.5379- 4.2101 ≈ -0.672 ≈ -0.7 hour

The percentage difference is the absolute difference divided by the difference in time between the centre of mass of the input hydrograph and that of the actual hydrograph.

100 × -0.672/4.2101 = -15.965% ≈ -16%

So the calculated value is -0.7 hour (16%) smaller than the actual value.  This means the calculated hydrograph is moving through the modelled reach of the Werribee River more quickly than the real hydrograph is moving through the real reach.  A modeller could adjust kc to slow the modelled hydrograph (but this would also affect other aspects of the calculated hydrograph, such as  the peak).

### Lag to peak

The lag to peak is the time from the centroid of the concentrated input to channels upstream of the point concerned, including both sub-areas inflows and concentrated channel inflow hydrographs, to time of peak of the hydrograph.

For Werribee worked example, the time to the centroid of the input hydrograph (at Melton Reservoir) is 19.90657 hour and the peak of the actual output hydrograph is at 16 hours so the absolute difference is:

16 – 19.90657 ≈ -3.9 hours

The peak of the calculated hydrograph  is also at 16 hours so the lag to peak of the calculated and actual hydrographs is the same i.e. the error is zero.

### Conclusion

The range of error statistics in RORB provides a summary of model performance which can be used as a guide during calibration.  Most modellers concentration on matching peaks and volumes but the other statistics also provide insight into aspects where the model can be improved.

Example calculations are available as an excel spreadsheet and a gist.

# Detecting outliers in water quality data

This post proposes a simple method to detect outliers in water quality data.  This is based on a proposal by John Tukey as discussed in the Wikipedia article: Outliers.

Tukey proposed a method that could be used to flag outliers based on the interquartile range.  An outlier is any observation falls outside:

${[} Q_1 - k(Q_3 - Q_1) {]}, {[} Q_3 + k(Q_3 - Q_1) {]}$

Where, Q1 and Q3 are the lower and upper quartiles.

Tukey proposed that k = 1.5 could be used to flag outliers, while k = 3 suggests observations that are “far out”.

A guide to whether the maximum or minimum value in a dataset are outliers is to calculate their k values.

$k_{max} = \frac{max(O) - Q_3}{Q_3 - Q_1}$

$k_{min} = \frac{ Q_1 - min(O)}{Q_3 - Q_1}$

The appealing feature of using the interquartile range is that it is resistant to outliers so there is reduced change of the statistic used to flag any outlier being influenced by outliers, an issue known as masking.

A function to calculate the k value for the maximum and minimum of a dataset is as follows:

Tukey_k <- function(x){
my.quantile <- quantile(x, na.rm = TRUE)
Q_25 <- my.quantile[2]
Q_75 <- my.quantile[4]

k_max <- as.vector((max(x, na.rm = TRUE) - Q_75)/(Q_75 - Q_25))
k_min <- as.vector((Q_25 - min(x, na.rm = TRUE))/(Q_75 - Q_25))

data.frame(k_max = k_max, k_min = k_min)

}

Let’s run some tests.

If we calculate the k values for a large normally distributed dataset (say 100,000 observations) they are less than 3.  For these data, getting a value larger than 3 is obviously rare.

set.seed(2016)
Tukey_k(rnorm(100000))
# k_max k_min
# 1 2.530309 2.531518

For water quality observations, a review by Duncan (1999) suggested the log-normal distribution was a good approximation for all the data he looked at so we need to take logs of the observations before calculating the k values.

For some data on Nickel concentration I’m working on, the k for the maximum value is 4.3.  Clearly a ‘far out’ value and one that deserves checking, via a histogram or QQ plot (see below).  Of the 12,000 values in this dataset there are clearly 15 or so that are very large (relative to the bulk of the data).

Code is available as a gist.

Identify, describe, plot and remove outliers from the dataset

# Rating curve resources

Hydrologists are often interested in the highest flows and that means we are using the upper limits of rating tables where uncertainty is large.

Below are some links that explain how rating curves are developed and potential issues with current practice.

Fenton, J. D. and Keller, R. J. (2001) The calculation of streamflow from measurements of stage.  Cooperative Research Centre for Catchment Hydrology.  Technical report 01/6.  (link)

USGS

• How streamflow is measured
• Buchanan, T. J. and Somers, W. P. (1969) Discharge measurements at gaging stations.  U. S. Geological Survey Techniques of Water-Resources Investigations Book 3, Chapter A8, pp. 65. (link)

John Fenton’s papers

The NZ Hydrological Society Ratings Workshop 2016

Stu Hamilton’s blog at Aquatic Informatics

Stu Hamilton’s Whitepaper: 5 Best Practices for Building Better Stage-Discharge Rating Curves.

### Journal articles

Van Eerdenbrugh, K., Van Hoey, S. and Verhoest, N. E. C. (2016) Identification of temporal consistency in rating curve data: Bidirectional Reach (BReach). Water Resources Research 52(8):6277-6296. (link)

Steinbakk, G. H., Thorarinsdottir, T. L., Reitan, T., Schlichting, L., Holleland, S. and Engeland, K. (2016) Propagation of rating curve uncertainty in design flood estimation.  Water Resources Research. DOI: 0.1002/2015WR018516  (link).

Kuczera, G. (1996) Correlated rating curve error in flood frequency inference.  Water Resources Research.  32(7):2119-2127 (link)

Mansanarez, V., Le Coz, J., Renard, B., Lang, M., Pierrefeu, G. and Vauchel, P. (2016) Bayesian analysis of stage-fall-discharge rating curves and their uncertainties.  Water Resources Research. DOI: 10.1002/2016WR018916 (link)

# Victoria’s open data directory

The Victorian government has set up an open data directory that has some great information for hydrologists.

A few of the data sets:

There is a nice an analysis of the snow data here (courtesy of Truii).

# The State of Hydrologic Practice in Victoria, Australia

Edit: 30 Nov 2016.  Slides from conference presentation and handout.

Edit: 11 Nov 2016.  The final version of the paper is here.

I’ve put together a conference that provides a summary of current hydrologic practice in Victoria based on a review of 20 recent flood studies.

Some highlights:

• The RORB hydrologic model was used in 17 of the 20 studies.  Clearly RORB is popular and useful even though much of the core capability dates from the 1970s.
• Climate change impacts on flooding were only considered in half the studies.  There has been other work that suggests Australian hydrologists and their clients lag behind their international counterparts in considering climate change (e.g. this study by Caroline Wenger and others).  This is surprising given that Engineers Australia issued climate change guidelines in 2003.  The latest guidelines are available here.
• The number of sub-catchments used in RORB models has increased greatly with the use of tools such as ArcHydro and CatchmentSim. There was an average of 92 sub-catchments in the studies I looked at.  RORB models from 1970 to the mid 1990s generally used 10 and 20 sub-catchments.
• No one used Monte Carlo or ensemble approaches to hydrologic flood modelling despite these features being available in RORB (For a summary of RORB’s capability see this conference paper from 2006).
• Rainfall temporal patterns were taken from the 1987 version of Australian Rainfall and Runoff.  RORB provides a tool to extract temporal patterns from local pluviographs but this was not used.
• Most studies used uniform spatial patterns for design rainfalls.  The new recommendations from Australian Rainfall and Runoff are that spatial patterns should reflect variations in rainfall across catchments (ARR Book 2, Chapter 6).  Clearly this will require a change in practice.
• The baseflow contribution to design flood hydrographs was generally not considered.  Again there is increased emphasis on this in the new ARR.

Ladson, A. R. (2016) The State of Hydrologic Practice in Victoria, Australia.  Hydrology and Water Resources Symposium.  28 Nov – 2 Dec 2016, Queenstown, NZ.

# Impact of climate change on evaporation

Will climate change increase or decrease evaporation?  The simple view is that as temperature increases, so will evaporation.  However when researchers have actually looked at measurements of pan evaporation, it is decreasing 1.

It turns out there are multiple competing effects.  Temperature is increasing but so is the amount of CO2 in the atmosphere. Carbon dioxide is a great plant fertilizer so we would expect more plant growth and in fact that is happening. The Earth is getting greener 2, and in particular, the leaf area is increasing as measured by Leaf Area Index (LAI)3. LAI is leaf area divided by land area.

More leaves means more evapotranspiration from plants right? Not necessarily.  With higher atmospheric CO2 plants use water more efficiently4. The stomata on the leaves don’t have to open so much which decreases evapotranspiration. The increased CO2 in the atmosphere works to both increase evapotranspiration – by increasing leaf area –  and decrease evapotranspiration – by increasing the efficiency with with plants use water.

Recent work has quantified all these effects5.  It is now clear that plant water use efficiency is increasing but this is outweighed by the increase in leaf area.  Evapotranspiration from plants is increasing.  Also, the increase in the leaf area means more rainfall is intercepted before it reaches the ground and this intercepted water is also evaporated.

More evaporation from plants means more water in the air and therefore less evaporation from meteorological instruments such as pans.  This could explain the measured decrease in pan evaporation in recent decades.  However climate change influence on evaporation is likely to be specific to location.  The ocean contributes most evaporation and recent work suggests the water cycle is intensifying i.e. more rainfall and more evaporation.  This also suggests that wet areas are become wetter and dry areas drier6.  Whether there is increased or decreased evaporation as measured by a pan depends on where the pan is.

### References and further reading

1. Roderick, M. L. and G. D. Farquhar (2004) Changes in Australian pan evaporation from 1970 to 2002. International Journal of Climatology 24(9): 1077-1090. (link)

2. Zhu, Z et al. (2016) Greening of the Earth and its drivers. Nature Climate Change doi:10.1038/nclimate3004

3. Curry, J. (2016) Rise in CO2 has greened planet Earth (link).

4. Frank, D. C. et al. (2015) Water-use efficiency and transpiration across European forests during the Anthropocene. Nature Climate Change doi:10.1038/nclimate2614 (link).

5. Zhang, Y. et al. (2016) Multi-decadal trends in global terrestrial evapotranspiration and its components. Scientific Reports 6:19124 doi:10.1038/srep19124

6. Durack, P. J., Wijffels, S.E. and Matear, R. J. (2012) Ocean salinities reveal strong global water cycle intensification during 1950 to 2000. Science 2012 April 27 336(6080):455-8. doi:10.1126/science.1212222