Category Archives: R

ARR2019 – Areal Reduction Factors

With the publication of the 2019 edition of Australian Rainfall and Runoff most, but not all, of the ambiguities around the calculation of Areal Reduction Factors have been fixed.

Equations are provided that allow calculation of Areal Reduction Factors (ARFs) for:

  • Short durations

Durations less than 12 hours; a single equation for the whole of Australia

  • Long durations

Durations between 24 hours and 168 hours, with different coefficients required for each of 10 regions (Figure 1).

These equations, and the long duration coefficients, are available from the data hub and from from Australian Rainfall and Runoff (ARR) (Book 2, Chapter 4.3).  The reference is to ‘coefficients’ in ARR and to ‘parameters’ on the data hub but they are the same thing.

2059

Figure 1: ARF regions

What the data hub doesn’t make clear is that procedures used to calculate an ARF depend on both duration and area.  Details are in ARR Book 2, Table 2.4.1.  There are 11 separate cases to consider which I list in 5 groups below.

1. Very small catchments:

1.1. Catchment area ≤ 1 km2, ARF = 1 for any duration

2. Very large catchments:

2.1. Catchment area > 30,000 km2, ARF can not be calculated using the generalised equations

3. Catchments between 1000 km2 and 30,000 km2

3.1. Short durations: for duration ≤ 12 hours, ARF can not be calculated using the generalised equations

3.2. Long durations: for duration ≥ 24 hours calculate ARF using the long duration equation

3.3. Between long and short durations (between 12 and 24 hours);  interpolate between the long duration and short duration ARFs.  So, although it is not valid to use the short duration ARFs in catchments of this size, the guidance suggests the 12 hour short duration ARF can be used as one terminal in the required interpolation.

4. Catchments between 10 km2 and 1000 km2

4.1. Short durations: use the short duration equation for durations ≤ 12 hours

4.2. Long durations: use the long duration equation for durations ≥ 24 hours

4.3. Between long duration and short duration: interpolate

5. Catchments between 1 km2  and 10 km2

5.1. Short duration: interpolate for the area between an ARF of 1 at 1 km2 and the short duration ARF for 10 km2.

5.2. Long duration: interpolate for the area between an ARF of 1 at 1 km2 and the long duration ARF for 10 km2.

5.3. Between long duration and short duration: Interpolate for the duration between the long duration ARF and short duration ARF for a catchment of 10 km2.  Then interpolate for the area between an ARF of 1 at 1 km2 and the value for a 10 km2 catchment.

Note, that for the area-based interpolations in 5.1 and 5.2, equation 2.4.4 is required (below).  For the duration based interpolations, equation 2.4.3 should be used.  There is an error in Table 2.4.1 in ARR where the wrong interpolation formula is referred to.

\mathrm{ARF} = \mathrm{ARF_{12hour} + (ARF_{24hour} -ARF_{12 hour}) \frac{(duration-720)}{720}}     (2.4.3)

\mathrm{ARF} = 1-0.6614(1 - \mathrm{ARF_{10km^2}} ) ( \mathrm{A}^{0.4} - 1 )     (2.4.4)

Another thing to be careful of is that the unrealistic negative values are calculated for large catchments and short durations.  For example, the ARF for a 1000 km2 catchment, 1 minute duration and AEP of 1% in the Southern Temperate zone is -0.79.  Of course, most practitioners are not interested in situations like this but if a Monte Carlo approach is used, these odd results may come up unless the parameter bounds are set carefully.

When setting up an ARF spreadsheet or script it is probably worth setting any values less than zero, to zero.  But also check that your hydrologic model won’t crash if ARF is zero.

The smallest duration considered in the derivation of the ARFs was 60 min so anything shorter than that is an extrapolation (Stensmyr et al., 2014).  If the critical case ends up being less than 60 min, check that the ARFs are realistic.

Example

There is a worked example in ARR Book 2, Chapter 6.5.3.

Region = East Coast North,  Area = 245.07, AEP = 1%, Duration = 24 hour (1440 min)

ARF = 0.929

ARF calculator

I’ve developed a simple ARF calculator as a web app here. The code is available as a gist.

Test cases

The test cases I used when developing the ARF calculator are here.  This gives ARFs for a range of AEPs, durations and areas that correspond to the 11 cases listed above, along with other checks.  I calculated these manually.  Please let me know if you think any are incorrect.

References

Stensmyr, P., Babister, M. and Retallick, M. (2014) Spatial patterns of rainfall.  Australian Rainfall and Runoff Revision Project 2. http://arr.ga.gov.au/__data/assets/pdf_file/0020/40556/ARR_Project2_Report_Short_ARF.pdf

 

 

 

 

 

 

Tidy Pre-burst data

Pre-burst rainfall data can be obtained from the ARR data hub but its not in a form that is easy to work with.  It’s not ‘tidy’.  Tidy data has one observation per row with information in neighbouring columns about that observation.  Once the data is in tidy format, its easy to use tools in R or pivot tables in excel to undertake analysis.  You spend a lot less time munging tidy data than the usual case when data is messy.

A tidy data structure for pre-burst data would look like the table below.  ‘Type’ can be either depth or ratio, available percentiles are 10, 25, 50, 75, 90, standard AEPs are 1, 2, 5, 10, 20, 50.  The value is either a depth in mm or the pre-burst to burst ratio.  The value is the observation.  Everything else is information about the observation.

Duration (min) Duration (hour) Type Percentile AEP Value
60 1 depth 10 50 0
60 1 ratio 10 50 0
2880 48 depth 90 1 15.1
2880 48 ratio 90 1 0.105

I’ve written a function Get_tidy_prebust that will return the pre-burst data in tidy format given the latitude and longitude of a location.  After that its easy.  Below is an example for Axe Creek. First we get the tidy data, and can then graph it in a variety of formats.  See the gist for all the details.

Axe = Get_tidy_prebust(lat =-36.9, lon = 144.36)
Axe %>%
filter(type == 'ratio') %>%
ggplot(aes(x = dur_hour, y = preburst, colour = factor(percentile))) +
geom_line() +
geom_point()+
facet_wrap(~AEP) +
labs(x = 'Preburst ratio',
y = 'Duration (hour)') +
scale_colour_discrete(name = 'Percentile')
Axe_dur_ratio

Figure 1: Relationship between pre-burst ratio and duration as a function of AEP and percentile

Axe_percentile_ratio

Figure 2: Relationship between pre-burst ratio and percentile as a function of AEP and duration

 

Scraping the data hub

The Australian Rainfall and Runoff data hub provides information to support modelling of design floods.  The usual way to get the required data is via a web interface but it is also possible to scrape the data.  Instructions are here under the heading ‘Advanced Use:’.

In R the getURI function from the RCurl package can be used.  A basic command is:
RCurl::getURI(glue("https://data.arr-software.org/?lon_coord={lon}&lat_coord={lat}&type=text&Preburst=1&OtherPreburst=1"))

Where ‘lon’ and ‘lat’ are the longitude and latitude of the location of interest.  This will return a text file with information on median prebust (Preburst=1) and preburst for other percentiles (OtherPreburst=1).  The text file can then be processed to extract required information.  Many other options are available.  For example if All=1 is used, information on a large number of parameters is returned including rainfall temporal patterns.

As an example, I’ve written a function, Get_burst, which extracts the preburst information for a percentile, latitude and longitude.  It is also necessary to specify if depths or ratios are required.  Example usage is shown below.

Get_preburst(percentile = 90, type = 'depth', lat = -36.9, lon = 144.36)   


   dur_min dur_hour AEP_50 AEP_20 AEP_10 AEP_5 AEP_2 AEP_1
 1      60      1     22.3   25.2   27.1  29    30.6  31.8
 2      90      1.5   24.2   25.4   26.1  26.9  31.5  35  
 3     120      2     27.2   33.4   37.5  41.5  43.6  45.2
 4     180      3     23.3   27.1   29.6  32    43.4  52  
 5     360      6     19.5   24.3   27.5  30.6  42.5  51.4
 6     720     12     12.8   20.3   25.3  30.1  34.4  37.6
 7    1080     18     15.6   18.1   19.7  21.3  25.7  29  
 8    1440     24     14.6   18.8   21.6  24.2  24.3  24.4
 9    2160     36      9     11.2   12.6  14    22.3  28.5
10    2880     48      1.1    2.7    3.7   4.7  10.7  15.1
11    4320     72      0.2    1.9    3     4    12.5  18.9

This approach is much quicker when working on a large number of sites and could be used to automate entry into other programs such as RORB.

The function Get_losses, returns initial and continuing loss from the data hub for any location.

# # Exampe usage
# lat = -33.87
# lon = 151.206
#
# Get_losses(lat = lat, lon = lon)
# $ILs
# [1] 28
#
# $CL
# [1] 1.6

 

When to plan a paddling trip on the King River

The King River in northeast Victoria is a fun kayaking destination but its only worth going when there is enough flow.  The best white water is the reach between Lake William Hovel and Cheshunt South (map).  According to the Whitehouse Canoe Club, the minimum level for paddling is 0.6 m at the Cheshunt gauge; a good level is 1.0 m.

We can use the data from the gauge, King River at Cheshunt (403227) to work out when to plan a trip.  Figure 1 shows the average daily levels for each day in the period of record (28 June 1967 to 10 March 2020).  I’ve made the points translucent so that data can be seen in areas were there is over-plotting.  The red dashed lines are the river levels of 0.6 m and 1 m.  The blue line is the average (a loess smooth).  The river is often above 0.6 m between early July and late October.

King_daily_all

Figure 1: Daily levels, King River at Cheshunt

It is also possible to calculate an empirical estimate of the probability that the river level will be greater than 0.6 m.  This is based on the proportion of days in the record, considering each day of the year, that the river level exceeds 0.6 m (Figure 2).  The best time to plan a trip is during August and early September when the probability of being able to paddle is greater than 80%. But be prepared for cool weather, the average maximum temperature in August is only 10.8 oC.

Before you go, check the level of Lake William Hovel which is the reservoir upstream of the white-water reach.  The lake must be full and spilling over the dam before the flow will get above 0.6 m at Cheshunt.  You can check the lake levels here.

King_prob_gt0p6

Figure 2: Probability that the King River level is greater than 0.6 m for each day of the year

 

 

The value of a word

This is recreational mathematics rather than hydrology…

I received an email recently about the value of hardwork, knowledge and attitude.  If a, b, c,…,z are represented by the numbers 1, 2, 3,…, 26, then hardwork adds to 98, knowledge to 96 and attitude to 100.

“While hard work and knowledge will get you close, attitude will get you there!”

What about the value of other words?  Here’s an R function that will calculate the value of a word or phrase.

Wordsum = function(my_word){
x = setNames(1:26, letters)
x1 = unlist(strsplit(tolower(my_word), '?'))
sum(x[x1], na.rm = TRUE)
}  

Lets try it.

Wordsum('hydrology')
#129

Wordsum('hydraulics')
#120

Clearly hydrology is more auspicious, but neither compare to Geomorphology (171).

1% flood: binomial distribution, conditional probabilities

I previously wrote about considering the occurrence of 1% floods as a binomial distribution, this post extends that analysis to look at conditional probabilities.  Some of the results are counter intuitive, at least to me, in that the risk of multiple 1% floods is larger than I would have guessed.

The probability of a 1% (1 in 100) annual exceedance probability (AEP) flood occurring in any year is 1%.  This can be treated as the probability of a “success”  in the binomial distribution, with the number of trials being the number of years. So the probability of having exactly one 1% flood in 100 years is

{100\choose 1}0.01^{1}\left( 1-0.01\right) ^{99} = 0.37

In R this can be calculated as dbinom(x = 1, size = 100, prob = 0.01) or in excel =BINOM.DIST(1,100, 0.01, FALSE).

The cumulative distribution function of the binomial distribution is also useful for flood calculations.

What is the probability of 2 or more 1% floods in 100 years:

R: pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE) = 0.264

Excel: =1 - BINOM.DIST(1,100, 0.01, TRUE) = 0.264

We can check this by calculating the probability of zero or one flood in 100 years and subtracting that value from 1.

1 - (dbinom(x = 1, size = 100, prob = 0.01) + dbinom(x = 0, size = 100, prob = 0.01)) = 0.264

We can also do conditional probability calculations which could be useful for risk assessment scenarios.

What is the probability that exactly two 1% floods occur in 100 years given that at least one occurs?

\Pr{(X = 2\mid X \ge 1)} =
dbinom(x = 2, size = 100, prob = 0.01)/pbinom(q = 0, size = 100, prob = 0.01, lower.tail = FALSE) = 0.291

What is the probability that at least two 1% floods occur in 100 years given that at least one occurs?
\Pr{(X \ge 2\mid X \ge 1)} =
pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE)/pbinom(q = 0, size = 100, prob = 0.01, lower.tail = FALSE) = 0.416

We can also check this by simulation.  This code generates the number of 1% floods in each of 100,000 100-year sequences.  We can then count the number of interest.

set.seed(1969) # use a random number seed so the analysis can be repeated if necessary
floods = rbinom(100000,100, 0.01) # generate the number of 1% floods in each of 100,000, 100-year sequences

floods_subset = floods[floods >= 1] # Subset of sequences that have 1 or more floods
# Number of times there are two or more floods in the subset of 1 or more floods

sum(floods_subset >= 2) / length(floods_subset)
# 0.4167966

# or
sum(floods >= 2)/sum(floods >= 1)

#[1] 0.4167966

A slightly tricker situation is a question like: What is the probability of three or fewer floods in 100-years given there is more than one.

\Pr{(X \le 3\mid X > 1)} = \Pr(X \le 3 \cap X > 1 )/\Pr( X > 1) 

floods_subset = floods[floods > 1] # Subset of sequences that have more than one flood

# Number of times there are three or fewer floods in the subset of more than one flood

sum(floods_subset ≤ 3) / length(floods_subset)
#[1] 0.9310957

# Or, for the exact value

# (Probability that X = 3 + Probability that X = 2)/(Probability that X > 1)
(dbinom(x = 3, size = 100, prob = 0.01) + dbinom(x = 2, size = 100, prob = 0.01))/ pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE)
#[1] 0.9304641


The probability of experiencing at least one 1% flood in 100-years is 1 - (1-0.01)^{100} = 0.634.  How many years would we need to wait to have a 99% chance of experiencing a 1% flood?

0.99 = 1-(1-0.1)^n

n=\frac{log(0.01)}{log(0.99)} = 458.2.  The next largest integer is 459.

We can also solve this numerically.  In R the formula is 0.99 = pbinom(q=0, size = n, prob = 0.01), solve for n. Using the uniroot function gives n = 459 years (see below).

So all these areas subject to a 1% flood risk will flood eventually, but it may take a while.

f = function(n) {
  n = as.integer(n) #n must be an integer
0.99 - pbinom(q = 0, size = n, prob = 0.01, lower.tail = FALSE) 
}

# $root
# [1] 458.4999

uniroot(f, lower = 100, upper = 1000)

pbinom(q = 0, size = 459, prob = 0.01, lower.tail = FALSE) 
# [1] 0.990079

How many years before there is a >99% chance of experiencing more than one flood? This is one minus (the probability of zero floods + the probability of one flood).

Let the number of years equal n.

1-((1-0.01)^n + n(0.01)(1-0.01)^{n-1}) = 0.99. Solving for n gives 662 years

Using a Treemap to show gauged areas

According to Wikipedia a Treemap is a method for displaying hierarchical data using nested rectangles.  Its easier to explain with an example which also shows how Treemaps are useful in hydrology.

In this post we’ll use a Treemap to provide information on the catchment of Western Port, a bay to the east of Melbourne; in particular, the proportion of the catchment that is gauged.  By the way, according to the Vicnames database, the official name is Western Port, not Western Port Bay.

There are 4 main gauged tributaries draining to Western Port:

  • Cardina Creek
  • Bunyip River
  • Lang Lang River
  • Bass River

Figure 1 shows the gauged portions of these catchments overlaid on a shaded relief base map.

WP_shaded

Figure 1: Gauged catchments draining to Western Port

We can also plot the the gauged and ungaged portions of these catchments on a treemap along with the remainder of the Western Port Catchment (Figure 2).  Or just the gauged portions compared to the rest (Figure 3).

WP_treemap_g&ug

Figure 2: Gauged and ungauged portions of the 4 catchments

WP_treemap_ungauged

Figure 3: Gauged portions of the four catchments compared with the rest of the Western Port Catchments

If we are going to estimate flow or pollution loads to Western Port than we need to factor up the results from the gauges.  A treemap shows how much extrapolation is necessary.  Code to produce these treemaps is available as a gist.