The general equation for flood quantile estimation (from Chow, 1951) is:

(1)

Where is the frequency factor that depends on the distribution of the flood data and the probability of interest, and are the mean and standard deviation.

The classical frequency factor for the log-normal distribution is:

(2)

Where, is a standard normal deviate and is the coefficient of variation . The standard normal deviate can be calculated, for example, using the `qnorm`

function in R. So for the 100-year flood, exceedance probability = 1/100, .

An R function to calculate this log-normal frequency factor is:

# Frequency factor for the log-normal distribution # m - mean # s - standard deviation # p - exceedance probability FF_LogNormal <- function(m, s, p) { cv <- s/m z <- qnorm(1 - p) (1/cv) * (exp(sqrt(log(1 + cv^2)) * z - 0.5 * log(1 + cv^2)) - 1) }

Kite (2004) provides an example we can use for testing.

Annual maximum discharges of the Saint John River at Fort Kent, New Brunswick for the 37 years from 1927 to 1963 have a mean of 81,000 cfs and a standard deviation of 22,800. Estimate the 100-year flood.

As, calculated below, Q100 = 148,000 cfs

m <- 81000 s <- 22800 p <- 0.01 Kt <- FF_LogNormal(81000, 22800, p) #2.943 (Q100 <- m + Kt * s) #[1] 148221.3

If flood data are log-normally distributed, that means the logarithms of the flood data are normally distributed. This suggests there are two ways of calculating the flood quantiles. We can use the data as-is along with the log-normal frequency factor (as in the above example), or take the logs of the data and use the frequency factor from the normal distribution.

Continuing with the example from Kite. The mean and standard deviation of the logarithms of the 37 events from the Saint John River are 11.263 and 0.284 (in cfs units).

The frequency factor for the 100-year flood, based on the normal distribution, will be 2.326 so the 100-year flood estimate is 150,800 cfs. Similar to the previous example but not exactly the same.

m.log <- 11.263 s.log <- 0.284 Kt <- qnorm(1 - 0.01) #2.326348 (Q100 <- exp(m.log + Kt*s.log)) #[1] 150795.9

Lets repeat this analysis using some Australian data. An annual flood series for the Hunter River at Singleton is available here.

The two estimates for the 100-year flood are 10,500 cumec using data as-is and 13,800 cumec when calculating in the log domain.

# Hunter River at Singleton library(repmis) my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv' singleton <- source_data(my.url) str(singleton) m <- mean(singleton$`Peak (cumec)`) # 1401.7 s <- sd(singleton$`Peak (cumec)`) # 2312.9 Kt <- FF_LogNormal(m, s, 0.01) # 3.917 (Q100 <- m + Kt * s) # 10460.5 m.log <- mean(log(singleton$`Peak (cumec)`)) # 6.425 s.log <- sd(log(singleton$`Peak (cumec)`)) # 1.336 Kt <- qnorm(1 - 0.01) # 2.326 (Q100 <- exp(m.log + Kt*s.log)) # 13818.9

Kuczera (1999) derives a frequency factor for the log-normal distribution which takes account of the uncertainty in the parameters – the mean and standard deviation. These parameters must be estimated from the flood data. Assuming nothing is known about these parameters other that what we learn from the flood measurements (i.e. the Bayesian prior is noninformative), the frequency factor is:

Using this frequency factor will result in a more conservative estimate of flood quantiles. As the sample size increases the the Bayesian frequency factor will approach the frequency factor based on the normal distribution.

An R function to calculate this Bayesian log-normal frequency factor is:

# Frequency factor for the log-normal distribution # n - number of data points # p - exceedance probability FF_LogNormal_Bayes <- function(n, p) { qt( 1-p, n-1) * sqrt(1 + 1/n) }

Calculate the Bayesian estimate for the 100-year flood quantile using the data for the Hunter River at Singleton.

Calculations are below. The estimate is 17,350 cumecs.

library(repmis) my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv' singleton <- source_data(my.url) m.log <- mean(log(singleton$`Peak (cumec)`)) # 6.425 s.log <- sd(log(singleton$`Peak (cumec)`)) # 1.336 Kt <- FF_LogNormal_Bayes(nrow(singleton), 0.01) # 2.4966 (Q100 <- exp(m.log + Kt*s.log)) # 17348

So we have three flood quantile estimators for log-normally distributed data:

- Take logs and use the frequency factor from the normal distribution
- Use data as-as and calculate the frequency factor using equation 2.
- Take logs and use the Bayesian frequency factor.

If the data set is large, all the results are the same.

set.seed(2016) my.flood <- rlnorm(1e6, 6, 1) # generate 1 flood values from a log-normal distribution. # True value exp(6 + 1*qnorm(1 - 0.01)) # 4131.302 # Direct calculation of quantile quantile(my.flood, probs = 0.99, type = 8 ) # 4114.213 # Taking logs and using normal distribution frequency factor Q100.log <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - 0.01) exp(Q100.log) # 4130.316 # Use frequency factor from equation 2 Q100 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), 0.01) Q100 # 4127.146 # Bayes frequency factor Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), 0.01) exp(Q100.Bayes) #4130.336

For small datasets, the estimates vary substantially.

First, let’s make a function that simulations 30 years of log-normal flood peaks and calculates the 100-year quantiles.

Test_f <- function(i, p) { # i = dummy variable so the function can be included in a loop # p = exceedance probability my.flood <- rlnorm(30, 6, 1) # Generate 30 years of flood data Q100.normal <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - p) Q100.eq2 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), p) Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), p) data.frame(Q100.normal = exp(Q100.normal), Q100.eq2 = Q100.eq2, Q100.Bayes = exp(Q100.Bayes)) }

Now, we’ll call this multiple times and compare the average quantiles for the three methods with the true quantile.

set.seed(5) out <- lapply(1:10000, Test_f, p = 0.01) out.df <- do.call(rbind, out) colMeans(out.df) Q100.normal Q100.eq2 Q100.Bayes 4334.727 3678.353 5204.641 # True quantile 4131

So, for for this data set, on average, the quantile based on taking the logs and using the frequency factor from the normal distribution is about right. The quantile based on equation 2 is too low, and the quantile using the Bayesian frequency factor is too high.

In another post, we’ll look at quantifying the uncertainty in quantile estimates.

Kite, G. W. (2004) Frequency and risk analysis in hydrology. Water Resources Publications, LLC.

]]>

where:

change in catchment storage

precipitation

imported water

actual evaportranspiration

stormwater runoff

wastewater discharge

Mitchell et al., (2003) provides data on the water balance for Curtin, ACT for 1979 to 1996. The water balance for the average, wettest and driest years are shown in the table below.

When presenting financial statements, a common approach is to use a waterfall chart which shows how the components of a financial balance contribute to an overall result. Here I’ve used a waterfall chart to show the water balance for Curtin for the driest and wettest year as reported by Mitchell et al., (2003).

Does this approach to visualising a water balance help understanding? A few things stand out:

- In the driest year, more water was input from the mains than from rainfall
- In the driest year, actual evapotranspiration was larger than rainfall and mains inputs.
- Evapotranspiration and stormwater change with climate, with large variation between the wet and dry years. Wastewater doesn’t change all that much.
- Precipitation is highly variable, ranging from 247 mm to 914 mm.

There is a guide to making a waterfall chart in excel here. The R code to produce the graphs shown in this blog is available as a gist, which draws on this blog.

Mitchell, V. G., T. A. McMahon and R. G. Mein (2003) Components of the Total Water Balance of an Urban Catchment. *Environmental Management ***32**(6): 735-746. (link)

]]>

The following steps can be used to extract and convert the data into a useable format.

1. Download and save rating table. Click the button shown to get the rating table as a text file.

2. Re-format the data to create columns of levels and flows. You’ll need to use your favourite tool for this munging step. An example using R is available as a gist.

3. Plot and compare with the online version

4. Save as a csv file for further use.

R code is available here.

Related posts:

]]>

Definitions:

- EY – Number of exceedances per year
- AEP – Annual exceedance probability
- AEP (1 in x) – 1/AEP
- ARI – Average Recurrence Interval (years)

For floods rarer than 5%, the relationship between the various frequency descriptors can be estimated by the following straightforward equations.

For common events, more complex equations are required (these will also work for any frequency):

A key result is that we can’t use the simple relationship ARI = 1/AEP for frequent events. So, for example, the 50% AEP event is not the same as the 2-year ARI event.

For an ARI of 5 years, what is the AEP:

For an AEP of 50%, what is the ARI?

R functions and example calculation available as a gist.

]]>

- Its often challenging to get good calibrations for all the available historical events and there may good reasons why.

Difficulties in calibrating a model to observed flood events of different magnitude should be taken as an indication of the changing role of processes.

In many cases a significant change occurs between floods that are mostly contained within the stream channel and floods in which floodplain storage plays an important role in the routing process.

If the model has only been calibrated to in-bank floods, confidence in its ability to represent larger floods will be lower.

- Calibration needs to focus on what the model is to be used for, not just ensuring past events are well represented.

The focus of model calibration is not just to develop a model that is well calibrated to the available flood data. Application of the model to the design requirements must be the primary focus.

It is often the case that calibration floods are relatively frequent while design applications require much rarer floods. In this case, work in refining the model calibration to the frequent floods may not be justified.

Parameter values should account for the expected future design conditions, rather than an unrepresentative calibration event.

Calibration usually works with historic flood events while the design requirements are for probabilistic events. The parameters calculated for the historic events may not be applicable to the design flood events.

- On using all available data.

Even if the data is of poor quality or incomplete, it is important that the model calibration be at least consistent with the available information.

Even poor quality observations may be sufficient to apply a ‘common sense test’.

…at least ensure that model performance is consistent with minimal data [available]…

- On inconsistent data

Effort should be concentrated on resolving the source of the inconsistency rather than pursing further calibration.

- Dealing with poor calibration.

It is far more important to understand why a model may not be calibrating well at a particular location than to use unrealistic parameter values to ‘force’ the model to calibrate.

- Don’t expect your model to provide a good fit to all data.

It is extremely unlikely that your simple model is perfectly representing the complex real world well, all your data has been collected without error, or is unaffected by local factors.

- The appearance of great calibrations may mean:

The model has been overfitted to the data with unrealistic parameter values, or

Some of the data, that does not fit well, has been ignored or not presented.

- Checking adopted parameters.

Calibration events should be re-run with adopted parameters and results should show at least reasonable performance for all of the calibration events.

- Confirming model suitability for design events

Model performance, for design events, should be confirmed using Flood Frequency Analysis results, if available, or regional flood frequency information.

Book 7 also has worthwhile guidance on uncertainty analysis, model checking and reporting.

]]>

- Superscripts in y-axis labels
- Probability scale on x-axis
- Labelling points on the x-axis that are different to the plotted values i.e. we are plotting the normal quantile values but labelling them as percentages
- Adding a title to the legend
- Adding labels to the legend
- Positioning the legend on the plot
- Choosing colours for the lines
- Using commas as a thousand separator.

Code is available as a gist, which also shows how to:

- Enter data using the tribble function, which is convenient for small data sets
- Change the format of data to one observation per row using the tidyr::gather function.
- Use a log scale on the y-axis
- Plot a secondary axis showing the AEP as 1 in X years
- Use the Probit transformation for the AEP values

Links for more information:

]]>

There is already evidence that rainfall intensity for short duration storms is increasing, which could lead to more frequent and larger flash floods. This is a particular issue in towns and cities because small urban catchments are especially vulnerable.

In the corporate world, consideration of climate change is being taken seriously. The recent Hutley opinion found that many climate change risks “would be regarded by a Court as being foreseeable at the present time” and that Australian company directors “who fail to consider ‘climate change risks’ now, could be found liable for breaching their duty of care and diligence in the future”.

The Task Force on Climate Related Financial Disclosures (TCFD), chaired by Michael Bloomberg, has recently released recommendations on how companies should report on climate change risks. This includes the need to report on risks of “Increased severity of extreme weather events such as cyclones and floods” and “Changes in precipitation patterns and extreme weather variability”.

In the Australian flood scene, the latest Handbook 7 – *Managing the floodplain: a guide to best practice in flood risk management in Australia* – provides advice on assessing and reporting on climate change risk. But the accompanying project brief template and guide, describe climate change aspects of a flood investigation as optional. The latest version of *Australian Rainfall and Runoff *provides recommended approaches to assessing climate change impacts on flooding but recent research argues these methods are too conservative.

On a positive note for Victoria, the Floodplain Management Strategy does encourage consideration of climate change (Policy 9A):

Flood studies prepared with government financial assistance will consider a range of floods of different probabilities, and the rarer flood events will be used to help determine the location’s sensitivity to climate change. Further climate change scenarios may be considered where this sensitivity is significant.

Flood investigations lead on to decisions about land use zoning and design of mitigation works. Are climate change risks to these measures foreseeable at the present time? If so, then they should be considered and reported on.

Clearly this is an area where knowledge and ideas are changing rapidly. Practising hydrologists need to keep up with latest methods, and managers and boards of floodplain management authorities need to be aware of the latest thinking on governance, risk management, and disclosure.

]]>

- What Do Floodplain Managers Do Now That Australian Rainfall and Runoff Has Been Released? – Monique Retallick, WMAwater.
- Australian Rainfall and Runoff: Case Study on Applying the New Guidelines -Isabelle Testoni, WMAwater.
- Impact of Ensemble and Joint Probability Techniques on Design Flood Levels -David Stephens, Hydrology and Risk Consulting.

There was also a workshop session where software vendors and maintainers discussed how they were updating their products to become compliant with the new ARR.

A few highlights:

1. The ARR team are working on a single temporal pattern that can be used with hydrologic models to get a preliminary and rapid assessment of flood magnitudes for a given frequency. This means an ensemble or Monte Carlo approach won’t be necessary in all cases but is recommended for all but very approximate flood estimates.

2. The main software vendors presented on their efforts to incorporate ARR2016 data and procedures into models. This included: RORB, URBS, WBMN, RAFTS. Drains has also included functionality. All the models use similar approaches but speakers acknowledged further changes were likely as we learn more about the implications of ARR2016. The modelling of spatial rainfall patterns did not seem well advanced as most programs only accept a single pattern so don’t allow for the influence of AEP and duration.

3. WMA Water have developed a guide on how to use ARR2016 for flood studies. This has been done for the NSW Office of Environment and Heritage (OEH) and looks to be very useful as it includes several case studies. The guide is not yet publicly available but will be provided to the NFRAG committee so may released.

4. Hydrologists need to take care when selecting the hydrograph, from the ensemble of hydrographs, to use for hydraulic modelling. A peaked, low-volume hydrograph may end up being attenuated by hydraulic routing. We need to look at the peaks of the ensemble of hydrographs as well as their volumes. The selection of a single design hydrograph from an ensemble of hydrographs was seen as an area requiring further research.

5. Critical duration – The identification of a single critical duration is often much less obvious now we are using ensemble rainfall patterns. It seems that many durations produce similar flood magnitudes. The implications of this are not yet clear. Perhaps if the peaks are similar, we should consider hydrographs with more volume as they will be subject to less attenuation from further routing.

6. There was lots of discussion around whether we should use the mean or median of an ensemble of events. The take away message was that in general we should be using the median of inputs and mean of outputs.

7. When determining the flood risk at many points is a large catchment, different points will have different critical durations. There was talk of “enveloping” the results. This is likely to be an envelope of means rather than extremes.

8. The probabilistic rational method, previously used for rural flood estimates in ungauged catchments, is no longer supported. The RFFE is now recommended.

9. The urban rational method will only be recommended for small catchments such as a “two lot subdivision”.

10. There was no update on when a complete draft of ARR Book 9 would be released.

11. Losses should be based on local data if there is any available. This includes estimating losses by calibration to a flood frequency curve. Only use data hub losses if there is no better information. In one case study that was presented, the initial loss was taken from the data hub and the continuing loss was determined by calibration to a flood frequency curve.

12. NSW will not be adopting the ARR2016 approach to the interaction of coastal and riverine flooding. Apparently their current approaches are better and have an allowance for entrance conditions that are not embedded in the ARR approach.

13. NSW will not be using ARR approaches to estimate the impacts of climate change on flooding. Instead they will use NARCLIM.

14. NSW have mapped the difference between the 1987 IFD and the 2016 IFD rainfalls and use this to assist in setting priorities for undertaking flood studies.

15. A case study was presented for a highly urbanized catchment in Woolloomooloo. There was quite an involved procedure to determine the critical duration for all points in the catchment and the temporal patterns that led to the critical cases. Results using all 10 patterns were mapped, gridded and averaged. I didn’t fully understand the approach as presented but there may be more information in the published version of Isabelle Testoni’s paper once it becomes available.

There is still much to learn about the new Australian Rainfall and Runoff and much to be decided. The papers at the FMA conference were a big help in understanding how people are interpreting and responding to the new guideline.

]]>

Actual evapotranspiration (AET) is shown to be a highly significant predictor of the net annual above-ground productivity in mature terrestrial plant communities. Communities included ranged from deserts and tundra to tropical forests. It is hypothesized that the relationship of AET to productivity is due to the fact that AET measures the simultaneous availability of water and solar energy, the most important rate-limiting resources in photosynthesis.

As a hydrologist I knew about actual evapotranspiration (evaporation plus transpiration) but hadn’t paid attention to the link with productivity. To an ecologist, productivity refers to the rate of biomass production through photosynthesis – where inorganic molecules, like water and carbon dioxide, are converted to organic material. Productivity can be measured as mass per unit area per unit time e.g. g m^{-2} d^{-1}.

In Australia, Actual evapotranspiration is mapped by the Bureau of Meteorology (Figure 1). There are high values along the coast north of Brisbane, Cape York and ‘The Top End‘. If Rosenzeig’s correlations hold, these areas are the most ecologically productive in Australia. In Victoria the highest AET is around Warrnambool, Gippsland and particularly, a small area on the east coast near Mallacoota. Many of the areas with highest AET are heavily forested.

Rosenzweig quantified the relationship between AET and productivity:

Where:

- NAAP is the net annual above-ground productivity in grams per square meter.
- AET is annual actual evapotranspiration in mm.

The 95% confidence intervals for the slope and intercept are provided.

Rosenzweig’s paper was published in 1968 and the relationship between AET and productivity is better understood now (e.g. Jasechko, S. et al., 2013). But the simple relationship between AET and productivity does provide an interesting perspective on the Australian landscape.

Michael L. Rosenzweig (1968) Net Primary Productivity of Terrestrial Communities: Prediction from Climatological Data,” *The American Naturalist* 102, no. 923 (Jan. – Feb., 1968): 67-74. DOI: 10.1086/282523 (link).

Jasechko, S., Sharp, Z., Gibson, J., Birks, S., Yi, Y. and Fawcett, P. (2013) Terrestrial water fluxes dominated by transpiration. Nature 496(7445):347-350 (link).

]]>

From Wikipedia:

If a certain event did not occur in a sample with

nsubjects, the interval from 0 to 3/n is a 95% confidence interval for the rate of occurrences in the population.

For example, if a levee hasn’t been overtopped since it was built 100 years ago, then it can be concluded with 95% confidence that overtopping will occur in fewer than 1 year in 33 (3/100). Alternatively the 95% confidence interval for the Annual Exceedance Probability of the flood that would cause overtopping is between 0 and 3/100 (3%). Of course you may be able to get a better estimate of the confidence interval if you have other data such as a flow record, information on water levels and the height of the levee.

The rule of 3 provides a reasonable estimate for *n* greater 30.

]]>