# Coronavirus resources

Sources of information on the COVID-19 epidemic.  These are sites that I’m looking at.

# Stormwater walk in the Royal Botanic Gardens

[CANCELLED as at 16 March]

Melbourne Royal Botanic Gardens has successfully reduced water consumption by more than 60% since 1994, yet the gardens are still beautiful, functional and win awards.   How do they do it?

Please come and enjoy a joint event by EA Victorian Water Panel and EIANZStormwater Walk in Melbourne Royal Botanic Gardens – to discover water conservation strategies and stormwater management.

Notice: this is an event for members of EIANZ and Engineers Australia only.

Tour content:

• The use of stormwater harvesting to offset mains irrigation demands in a large and water-thirsty public environment
• Monitoring water quality and achieving significant improvement in key metrics including nitrogen and phosphorus reduction
• Water treatment plant – visit and discuss the role of treatment and automated systems for irrigation
• Not just wetlands – the various issues in managing an ornamental lake supplied by stormwater (Blue Green Algae, floating and submerged macrophytes)
• Where to next – the use of desalinated water?

For more details, and to register, please click through to the flyer.

The Stormwater Walk in Melbourne Royal Botanic Gardens is on 19 March 2020. [Has now been cancelled]

# Log-normal flood frequency analysis

Although the log-normal distribution is not usually used for flood frequency estimation in Australia it can be useful to compare results with more common approaches.

The general equation for flood quantile estimation (from Chow, 1951) is: $x_t = \mu+K_t\sigma$      (1)

Where $K_t$ is the frequency factor that depends on the distribution of the flood data and the probability of interest, $\mu$ and $\sigma$ are the mean and standard deviation.

## Frequency factors

The classical frequency factor for the log-normal distribution is: $K_t = \left( \frac{1}{C_v} \right) \left[ \exp \left\{ Z\left( \log \left(1 + C_v^2 \right) \right)^\frac{1}{2} - 0.5\left( \log \left( 1 + C_v^2 \right) \right) \right\} -1\right]$      (2)

Where, $Z$ is a standard normal deviate and $C_v$ is the coefficient of variation $= \frac{\mu}{\sigma}$.  The standard normal deviate can be calculated, for example, using the qnorm function in R.  So for the 100-year flood, exceedance probability = 1/100, $z = 2.326$.

An R function to calculate this log-normal frequency factor is:

# Frequency factor for the log-normal distribution
# m - mean
# s - standard deviation
# p - exceedance probability

FF_LogNormal <- function(m, s, p) {
cv <- s/m
z <- qnorm(1 - p)
(1/cv) * (exp(sqrt(log(1 + cv^2)) * z - 0.5 * log(1 + cv^2)) - 1)
}



### Example 1

Kite (2004) provides an example we can use for testing.

Annual maximum discharges of the Saint John River at Fort Kent, New Brunswick for the 37 years from 1927 to 1963 have a mean of 81,000 cfs and a standard deviation of 22,800. Estimate the 100-year flood.

As, calculated below, Q100 = 148,000 cfs

m <- 81000
s <- 22800
p <- 0.01

Kt <- FF_LogNormal(81000, 22800, p) #2.943

(Q100 <- m + Kt * s)
# 148221.3



If flood data are log-normally distributed, that means the logarithms of the flood data are normally distributed.  This suggests there are two ways of calculating the flood quantiles.  We can use the data as-is along with the log-normal frequency factor (as in the above example), or take the logs of the data and use the frequency factor from the normal distribution.

### Example 2

Continuing with the example from Kite.  The mean and standard deviation of the logarithms of the 37 events from the Saint John River are 11.263 and 0.284 (in cfs units).

The frequency factor for the 100-year flood, based on the normal distribution, will be 2.326 so the 100-year flood estimate is 150,800 cfs.  Similar to the previous example but not exactly the same.

m.log <- 11.263
s.log <- 0.284

Kt <- qnorm(1 - 0.01) #2.326348

(Q100 <- exp(m.log + Kt*s.log))
# 150795.9


### Example 3

Lets repeat this analysis using some Australian data.  An annual flood series for the Hunter River at Singleton is available here.

The two estimates for the 100-year flood are 10,500 cumec using data as-is and 13,800 cumec when calculating in the log domain.

# Hunter River at Singleton
library(repmis)
my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv'

singleton <- source_data(my.url)
str(singleton)

m <- mean(singleton$Peak (cumec)) # 1401.7 s <- sd(singleton$Peak (cumec))   # 2312.9
Kt <- FF_LogNormal(m, s, 0.01) # 3.917
(Q100 <- m + Kt * s)
# 10460.5

m.log <- mean(log(singleton$Peak (cumec))) # 6.425 s.log <- sd(log(singleton$Peak (cumec)))   # 1.336
Kt <- qnorm(1 - 0.01) # 2.326
(Q100 <- exp(m.log + Kt*s.log))
# 13818.9



## Bayesian frequency factor

Kuczera (1999) derives a frequency factor for the log-normal distribution which takes account of the uncertainty in the parameters – the mean and standard deviation.  These parameters must be estimated from the flood data.  Assuming nothing is known about these parameters other that what we learn from the flood measurements (i.e. the Bayesian prior is noninformative), the frequency factor is: $K_t = t_{n-1} \times \sqrt{1 + \frac{1}{n}}$

Using this frequency factor will result in a more conservative estimate of flood quantiles. As the sample size increases the the Bayesian frequency factor will approach the frequency factor based on the normal distribution.

An R function to calculate this Bayesian log-normal frequency factor is:

# Frequency factor for the log-normal distribution
# n - number of data points
# p - exceedance probability

FF_LogNormal_Bayes <- function(n, p) {

qt( 1-p, n-1) * sqrt(1 + 1/n)

}



### Example 4

Calculate the Bayesian estimate for the 100-year flood quantile using the data for the Hunter River at Singleton.

Calculations are below.  The estimate is 17,350 cumecs.

library(repmis)
my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv'

singleton <- source_data(my.url)

m.log <- mean(log(singleton$Peak (cumec))) # 6.425 s.log <- sd(log(singleton$Peak (cumec))) # 1.336
Kt <- FF_LogNormal_Bayes(nrow(singleton), 0.01) # 2.4966

(Q100 <- exp(m.log + Kt*s.log))
# 17348


### Testing

So we have three flood quantile estimators for log-normally distributed data:

1. Take logs and use the frequency factor from the normal distribution
2. Use data as-as and calculate the frequency factor using equation 2.
3. Take logs and use the Bayesian frequency factor.

If the data set is large, all the results are the same.

set.seed(2016)
my.flood <- rlnorm(1e6, 6, 1)  # generate 1 flood values from a log-normal distribution.

# True value

exp(6 + 1*qnorm(1 - 0.01)) # 4131.302

# Direct calculation of quantile
quantile(my.flood, probs = 0.99, type = 8 ) # 4114.213

# Taking logs and using normal distribution frequency factor
Q100.log <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - 0.01)
exp(Q100.log) # 4130.316

# Use frequency factor from equation 2
Q100 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), 0.01)
Q100 # 4127.146

# Bayes frequency factor
Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), 0.01)
exp(Q100.Bayes) #4130.336



For small datasets, the estimates vary substantially.

First, let’s make a function that simulations 30 years of log-normal flood peaks and calculates the 100-year quantiles.

Test_f <- function(i, p) {

# i = dummy variable so the function can be included in a loop
# p = exceedance probability

my.flood <- rlnorm(30, 6, 1) # Generate 30 years of flood data
Q100.normal <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - p)
Q100.eq2 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), p)
Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), p)
data.frame(Q100.normal = exp(Q100.normal), Q100.eq2 = Q100.eq2, Q100.Bayes = exp(Q100.Bayes))

}



Now, we’ll call this multiple times and compare the average quantiles for the three methods with the true quantile.

set.seed(5)
out <- lapply(1:10000, Test_f, p = 0.01)
out.df <- do.call(rbind, out)
colMeans(out.df)

Q100.normal Q100.eq2 Q100.Bayes
4334.727 3678.353 5204.641

# True quantile 4131



So, for for this data set, on average, the quantile based on taking the logs and using the frequency factor from the normal distribution is about right.  The quantile based on equation 2 is too low, and the quantile using the Bayesian frequency factor is too high.

In another post, we’ll look at quantifying the uncertainty in quantile estimates.

## References

Chow, V. T. (1951) A general formula for hydrologic frequency analysis.  Transactions, American Geophysical Union 32:231-237.

Kite, G. W. (2004) Frequency and risk analysis in hydrology.  Water Resources Publications, LLC.

Kuczera (1999) Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference.  Water Resources Research 35(5): 1551-1557.

I’ve written about lead levels in rain tanks here and in this book chapter.  We (Magyar and Ladson, 2015) reported on 32 studies that found lead levels in tanks that exceeded drinking water guidelines.  I’ll continue to update this post with any more recent studies that I come across.

### Indonesia 2019

High lead levels and rainwater in West Kalimantan which this paper attributes to corrosion of zinc roofs by acid rain (Khayan et al., 2019).

### Ghana 2019

Quality of water from rooftops in Tarkwa, Ghana.  All water samples had lead concentration exceeding the WHO value of 0.01 mg/L.  Average lead concentrations (n = 6): old Rusty roof (0.097 mg/L), non-rusty roof (0.015 mg/L), directly capture rainwater (0.024 mg/L)  (Agyemang et al., 2019).

Rainwater quality was measured in tanks at 23 primary schools in southwest coastal Bangladesh.  Islam et al, 2019 report:

Concentration of Pb exceeded the maximum allowable limit for drinking water indicated by WHO and Bangladesh drinking water guideline value in 92% and 61% of the samples respectively, and the mean concentration was 0.08 mg/l (8 times higher than the WHO guideline value). The Pb contamination possibly occurred from the painting on roof railing and roof stair room.

### South Australia 2018

Water samples were collected from 53 tanks in Adelaide, 365 samples in total.  In 47 out of the 53 tanks, lead was above the Australian Drinking Water Guideline values in at least one sample (with 180/365 samples above 0.01 ppm). (Chabaka et al., 2018)

### New Zealand 2016

Some recent work in New Zealand (link); researchers looked at lead levels in six water tanks in the Wellington area and found that 69% of samples exceeded WHOs health-based guideline of 0.01 mg/L for lead.

### Older studies that weren’t in our original list

Walter, D. and Inman, D. (1982) Rainwater as an alternative source on Novia Scotia.  Proceedings of the International Conference on Rainwater Cisterns Systems. Honolulu, pp 202-210.

Daoud et al., (2011) found few issues associated with lead in roof-harvest rainwater in the West Bank, Palestinian Authority.  The acidity of rainwater was reduced by storage in concrete tanks or in storages excavated in limestone rock).

Daoud et al. (2011) also references some papers that I don’t think we looked at and which, in their summary have low lead levels.

Simmons, G., Hope, V., Lewis, G., Whitemore, J. & Gao, W. Contamination of potable roof-collected rainwater in Auckland, New Zealand. Water Res 35:1518-1524.

Schets, F. M., Italiaander, R., Van den Berg, H. H. J. L. & de RodaHusman, A. M. Rain harvesting: quality assessment and utilization in the Netherlands. J. Water Health 8:224–235.

Handia, L., Tembo, J. M. & Mwiindwa, C. Potential of rainwater harvesting in urban Zambia. J. Phys. Chem. Earth 28:893–896

Zhu, K., Zhang, L., Hart, W., Liu, M. & Chen, H. Qualityissues in harvested rainwater in arid and semi-aridLoess Plateau of Northern China. J. Arid Environ.57:487–505.

Radaideh, J., Al-Zboon, K., Al-Harahsheh, A. & Al-Adamat, R. Quality assessment of harvested rainwater for domesticuses.Jordan J. Earth Environ. Sci. 2:26–31.

Sazakli, E., Alexopoulos, A. & Leotsinidis, M. Rainwaterharvesting: auality assessment and utilization in KefaloniaIsland, Greece. Water Res.41:2039–2047.

### References

Agyeman, A., MacCarthy, J. and Muhammad, I. (2019) Assessment of the quality of water from rooftops (a case study of Nkamponasi in Tarkwa, Ghana).  International journal of Scientific and Research Publications, 9(5):2250-3153.

Chabaka, E., Whiley, H., Edwards, J. and Ross, K. (2018) Lead, Zinc, Copper and Cadmium content of water from South Australian Rainwater Tanks.  International Journal of Environment Research and Public Health 15(7):1551 DOI: 10.3390/ijerph15071551

Daoud, A. K., Swaileh, K., Hussein, R. and Matani, M. (2011) Quality assessment of roof-harvest rainwater in the West Bank, Palestinian Authority.  Journal of Water and Health 9(3):525-33.

Islam, A., Akber, A., Rahman, A., Islam, A. and Kabir, P. (2019) Evaluation of harvested rainwater quality at primary schools of southwest coastal Bangladesh.  Environmental Monitoring and Assessment 191:80 https://doi.org/10.1007/s10661-019-7217-6

Khayan, K, Husodo, A. H., Astuti, I., Sudarmadji, S. and Djohan, T. S. (2019) Rainwater as a source of drinking water: health impacts and water treatment. Journal of Environmental and Public Health. https://doi.org/10.1155/2019/1760950

Magyar, M. I. and Ladson, A. R. (2015) Chemical quality of rainwater in rain tanks. In: Sharma, A., Begbie, D. and Gardner, T.  Rainwater tank systems for urban water supply: design, yield, energy, health risks, economics and social perceptions.  IWA publishing. pp 207-224. Link to Chapter 9

Stewart, C., Kim, N. D., Johnson, D. M. and Nayyerloo, M. (2016) Health hazards associated with consumption of roof-collected rainwater in urban areas in emergency situations.  Int. J. Environ. Res. Public Health 13(10):1012 doi:10.3390/ijerph13101012

# RORB: comparing modelled and measured hydrographs

[Update  15 Nov 2016: Calculations available as a spreadsheet]

RORB provides 7 statistics to summarise the match between  calculated and actual hydrographs.  A comparison of:

1. Peak discharge (m3/s)
2. Time to peak (hour)
3. Volume (m3)
4. Average absolute coordinate error (m3/s)
5. Time to centroid (hour)
6. Lag (centre of mass to centre of mass) (hour)
7. Lag to peak (hour)

An example is shown in Figure 1 below which compares a calculated and actual hydrograph for the Werribee River.  This is taken from section 10.2.1 of the RORB manual.  The difference between the hydrographs is summarised as an absolute error and a percentage error for each statistic.

During calibration, the losses and kc can be varied until the modeller is satisfied there is a reasonable match between the calculated and actual hydrographs across the range of statistics.  Generally modellers focus on peak discharge and volume but the other error measures are also useful.  For example, problems in the speed of the hydrograph (time to centroid and lag) can be related to the number of sub-catchments used in the RORB model (see the discussion here).

There is a brief summary of these statistics in Section 9.2.1.1 of the RORB manual.  In this blog I’ll discuss them in more detail. Figure 1: Error statistics provided by RORB based on the differences between calculated and actual hydrographs

### Example data set

Example calculations are made using data from the Werribee River worked example (Section 10.2 of the RORB manual).  The actual and calculated hydrographs are listed on page 111 and 112.  See Figure 2. Figure 2: Calculated and actual hydrographs for the Werribee worked example (p111 – 112 of the RORB manual)

### Peak discharge

The error in peak discharge is the difference between the maximum of the calculated RORB hydrograph and the maximum of the actual hydrograph provided as input to RORB.

For the Werribee example:

max(actual) = 356 m3/s

max(calculated) = 337.999 m3/s

Absolute difference = max(calculated) – max(actual) = 337.999 – 356 = – 18 m3/s

Percentage difference = 100×(max(calculated) – max(actual))/max(actual) $\dfrac{-18}{356} \approx 5.1\%$

Note that these error measures are expressed in a the opposite way to that used by statisticians.  In statistics, residuals are usually expressed actual – calculated  e.g. $y - \hat{y}$.  Here we are reporting calculated – actual.  So if the difference is negative, the calculated value is too small.

### Time to peak

This is where it starts to get interesting.  Looking at the data in Figure 2, the peak flow for both the actual and calculated hydrographs occurs at the 9th time increment, which Figure 2 suggests, is at 18 hours.  However, look at the statistics in Figure 1; the time to peak is listed as occurring at 16 hours.  What’s going on?

We can get a clue by looking at the RORB control vector for the Werribee case study (see Figure 3).  The first line highlighted in green show that the time increment is 2 hours and calculations will be undertaken for 28 increments. The second line in green shows that the two actual hydrographs (Melton Reservoir outflow and Werribee weir) both start at the zeroth time increment and end at the 28th time increment.  The trick is that increments start at zero not 1.  The increments shown in Figure 2 start at 1 not zero.  This is an error in reporting.  The first flows listed in Figure 2 actually occur at time zero so the maximum flows occur after 16 hours, not 18 hours.  Its a bit hard to tell, but if you look closely at the graph in Figure 1, the maximum flow occurs at 16 hours not 18 hours. Figure 3: RORB control vector from the Werribee case study (p106 of the RORB manual)

There seems to be an ‘out by one’ error in the RORB output file.  Other case studies in the RORB manual have the same issue.  This creates an inconsistency between the on-screen statistics, such as those shown in Figure 1, and the RORB text output.

### Volume

RORB estimates hydrograph volumes as the mean flow times the length of the hydrograph.

For the data shown in Figure 2, the means are:

• Calculated 119.0093 m3/s
• Actual 122.4138 m3/s

To convert these mean flows to volumes (m3) multiply by the length of the hydrograph.  In this case, there are 29 increments (remember we start at zero and finish at 28).  Each increment is 2 hours long so we need to multiply by 2 × 60 × 60 to convert hours to seconds.

• Volume of calculated hydrograph 119.0093 × 29 × 2 × 60 × 60 = 24848021 ≈ 0.25 × 108 m3
• Volume of actual hydrograph 122.4138 × 29 × 2 × 60 × 60 = 25560000 ≈ 0.26 × 108 m3

The absolute difference is 24848021 – 25560000 =  -711979 ≈ -0.71 × 106 m3

Percentage difference:

100 × (24848021 – 25560000)/25560000 ≈ -2.8%

### Average absolute coordinate error

The average absolute coordinate error is the mean of absolute values of the differences between the actual and calculated hydrograph at each time step. $\mathrm{AACE} = \dfrac{\sum\limits_{i }^{N} \lvert a_i - c_i \rvert}{N+1}$

Where:

• AACE is the average absolute coordinate error
• $a_i$ is the flow at each $i$ time increment for the actual hydrograph
• $c_i$ is the flow at each $i$ time increment for the calculated hydrograph
• There are $N + 1$ time increments.

For the Werribee case study, the first time step  is at zero and the final is at 28 so there are 29 increments in total.

The average absolute coordinate error = $\dfrac{\lvert 0 - 0 \rvert + \lvert 0 - 0\rvert + \lvert 8 - 8.007 \rvert + \lvert 34 - 37.806 \rvert + \ldots}{28+1} \approx 7.6 \; \mathrm{m^3 s^{-1}}$

The percentage error is the average absolute coordinate error divided by the average flow of the actual hydrograph.

For the Werribee case study, the average flow during the actual hydrograph is 122.4138 m3/s so the percentage average absolute coordinate error is: $100 \times \dfrac{7.6}{122.4} \approx 6.2 \%$

### Time to centroid

The time to the centroid of the hydrograph can be be calculated as sum of flow × time divided by the sum of the flow $\dfrac{\sum{}{}t_iQ_i}{\sum{}{}Q_i}$

Looking at the data in Figure 2, and correcting for the fact that the first time step is at zero, not 2 hours,  the centroid for the actual hydrograph will be: $\frac{0 \times 0 + 2 \times 0 + 4 \times 8 + 6\times 34 + \ldots}{0+0+8+35+\ldots}$

Time to centroid (actual) = 24.11662 hour

Time to centroid (calculated) = 23.44448 hour

Absolute error = 23.4 4- 24.11 = -0.672 ≈-0.7 hour

Percentage error in the time to centroid is the absolute error divided by the time to centroid of the actual hydrograph. $\dfrac{-0.672}{24.117} \approx -2.8\%$

Lag (centre of mass to centre of mass)

The RORB manual (p101) defines the lag (c.m to c.m) as:

the time from the centroid of concentrated inputs to the channels upstream of the point concerned, including both sub-area inflows and concentrated channel inflow hydrographs, to the time of centroid of the hydrograph.

For the Werribee worked example, the concentrated input is the hydrograph at Melton Reservoir.  This is listed in Figure 3 (the two lines under the green box):

0, 0, 66, 150, 253, …

The time to the centroid of this hydrograph can be calculated using the same procedure as before.

Time to centroid of the input hydrograph at Melton Reservoir is 19.90657  hour.

The difference in time between the centre of mass of the input hydrograph and that of the actual hydrograph is:

24.111662 – 19.90657 = 4.21 ≈ 4.2 hour

The difference in time between the centre of mass of the input hydrograph and that of the calculated hydrograph is:

23.44448 – 19.90657 = 3.53823 ≈ 3.5379 hour

The absolute difference between these values is:

3.5379- 4.2101 ≈ -0.672 ≈ -0.7 hour

The percentage difference is the absolute difference divided by the difference in time between the centre of mass of the input hydrograph and that of the actual hydrograph.

100 × -0.672/4.2101 = -15.965% ≈ -16%

So the calculated value is -0.7 hour (16%) smaller than the actual value.  This means the calculated hydrograph is moving through the modelled reach of the Werribee River more quickly than the real hydrograph is moving through the real reach.  A modeller could adjust kc to slow the modelled hydrograph (but this would also affect other aspects of the calculated hydrograph, such as  the peak).

### Lag to peak

The lag to peak is the time from the centroid of the concentrated input to channels upstream of the point concerned, including both sub-areas inflows and concentrated channel inflow hydrographs, to time of peak of the hydrograph.

For Werribee worked example, the time to the centroid of the input hydrograph (at Melton Reservoir) is 19.90657 hour and the peak of the actual output hydrograph is at 16 hours so the absolute difference is:

16 – 19.90657 ≈ -3.9 hours

The peak of the calculated hydrograph  is also at 16 hours so the lag to peak of the calculated and actual hydrographs is the same i.e. the error is zero.

### Conclusion

The range of error statistics in RORB provides a summary of model performance which can be used as a guide during calibration.  Most modellers concentration on matching peaks and volumes but the other statistics also provide insight into aspects where the model can be improved.

Example calculations are available as an excel spreadsheet and a gist.

# Detecting outliers in water quality data

This post proposes a simple method to detect outliers in water quality data.  This is based on a proposal by John Tukey as discussed in the Wikipedia article: Outliers.

Tukey proposed a method that could be used to flag outliers based on the interquartile range.  An outlier is any observation falls outside: ${[} Q_1 - k(Q_3 - Q_1) {]}, {[} Q_3 + k(Q_3 - Q_1) {]}$

Where, Q1 and Q3 are the lower and upper quartiles.

Tukey proposed that k = 1.5 could be used to flag outliers, while k = 3 suggests observations that are “far out”.

A guide to whether the maximum or minimum value in a dataset are outliers is to calculate their k values. $k_{max} = \frac{max(O) - Q_3}{Q_3 - Q_1}$ $k_{min} = \frac{ Q_1 - min(O)}{Q_3 - Q_1}$

The appealing feature of using the interquartile range is that it is resistant to outliers so there is reduced change of the statistic used to flag any outlier being influenced by outliers, an issue known as masking.

A function to calculate the k value for the maximum and minimum of a dataset is as follows:

Tukey_k <- function(x){
my.quantile <- quantile(x, na.rm = TRUE)
Q_25 <- my.quantile
Q_75 <- my.quantile

k_max <- as.vector((max(x, na.rm = TRUE) - Q_75)/(Q_75 - Q_25))
k_min <- as.vector((Q_25 - min(x, na.rm = TRUE))/(Q_75 - Q_25))

data.frame(k_max = k_max, k_min = k_min)

}

Let’s run some tests.

If we calculate the k values for a large normally distributed dataset (say 100,000 observations) they are less than 3.  For these data, getting a value larger than 3 is obviously rare.

set.seed(2016)
Tukey_k(rnorm(100000))
# k_max k_min
# 1 2.530309 2.531518

For water quality observations, a review by Duncan (1999) suggested the log-normal distribution was a good approximation for all the data he looked at so we need to take logs of the observations before calculating the k values.

For some data on Nickel concentration I’m working on, the k for the maximum value is 4.3.  Clearly a ‘far out’ value and one that deserves checking, via a histogram or QQ plot (see below).  Of the 12,000 values in this dataset there are clearly 15 or so that are very large (relative to the bulk of the data).

Code is available as a gist. Identify, describe, plot and remove outliers from the dataset

# Rating curve resources

Hydrologists are often interested in the highest flows and that means we are using the upper limits of rating tables where uncertainty is large.

Below are some links that explain how rating curves are developed and potential issues with current practice.

Fenton, J. D. and Keller, R. J. (2001) The calculation of streamflow from measurements of stage.  Cooperative Research Centre for Catchment Hydrology.  Technical report 01/6.  (link)

USGS

• How streamflow is measured
• Buchanan, T. J. and Somers, W. P. (1969) Discharge measurements at gaging stations.  U. S. Geological Survey Techniques of Water-Resources Investigations Book 3, Chapter A8, pp. 65. (link)

John Fenton’s papers

The NZ Hydrological Society Ratings Workshop 2016

Stu Hamilton’s blog at Aquatic Informatics

Stu Hamilton’s Whitepaper: 5 Best Practices for Building Better Stage-Discharge Rating Curves.

### Journal articles

Van Eerdenbrugh, K., Van Hoey, S. and Verhoest, N. E. C. (2016) Identification of temporal consistency in rating curve data: Bidirectional Reach (BReach). Water Resources Research 52(8):6277-6296. (link)

Steinbakk, G. H., Thorarinsdottir, T. L., Reitan, T., Schlichting, L., Holleland, S. and Engeland, K. (2016) Propagation of rating curve uncertainty in design flood estimation.  Water Resources Research. DOI: 0.1002/2015WR018516  (link).

Kuczera, G. (1996) Correlated rating curve error in flood frequency inference.  Water Resources Research.  32(7):2119-2127 (link)

Mansanarez, V., Le Coz, J., Renard, B., Lang, M., Pierrefeu, G. and Vauchel, P. (2016) Bayesian analysis of stage-fall-discharge rating curves and their uncertainties.  Water Resources Research. DOI: 10.1002/2016WR018916 (link)

Tomkins, K. M. (2012) Uncertainty in streamflow rating curves: methods, controls and consequences.  Hydrological Processes 28(3):464-481. (link)

### Other articles

Ward, B and Thomas, L. (2008) Does your rating curve hold water? The consequences of rating curve errors. ANCOLD Conference (link)