# ARR point temporal patterns

Previous posts (here and here) have looked at Australian Rainfall and Runoff areal temporal patterns and identified some that appear to be unusual and may lead to problems of embedded bursts. This post reviews point patterns.  Point patterns are suitable for modelling smaller catchments; those with areas less than 75 km2.

Embedded bursts occur when the rainfall associated with a period within a pattern is rarer than the event represented by the pattern as a whole.  Whether embedded bursts will be an issue depends on local values of design rainfall so patterns need to be checked whenever hydrological models are applied to a new location.  These checks are available in many hydrologic modelling programs.

Here I’ve used an approximate method to identify that patterns where there is potential for embedded bursts.  This involves the calculation of a non-uniformity index (NUI) (explained here).

The top ten least uniform point patterns are shown in Table 1 with the top 8 graphed in figures 1 and 2.

Table 1: Least uniform point temporal patterns
EventID Durtation (min) Duration (hr) Time step (min) Region AEP NUI Source Region
1208 8640 144 180 Rangelands (West) frequent 419 Rangelands (West)
1182 10080 168 180 Rangelands (West) intermediate 416 Rangelands (West)
1973 10080 168 180 Rangelands frequent 375 Rangelands
1183 8640 144 180 Rangelands intermediate 357 Rangelands
1181 8640 144 180 Rangelands (West) intermediate 357 Rangelands (West)
2870 10080 168 180 Central Slopes frequent 342 Central Slopes
1083 10080 168 180 Rangelands intermediate 341 Rangelands
1079 10080 168 180 Rangelands (West) intermediate 341 Rangelands (West)

Figure 1: Top 4 least-uniform patterns

Figure 2: Next 4 least uniform patterns

All of these are long patterns, 144 hours or 168 hours and contain extensive periods of little rain with isolated bursts.  Nine of 10 patterns are from rangelands; arid areas where long rainstorms are relatively rare.  Deciding on representative patterns for modelling long duration events in these areas is challenging.

### Large peaks 24 hours apart

Analysis of areal patterns showed here were some that had large peaks 24 hours apart which could have been caused by accumulated rainfall in pluviograph records that had not been processed correctly.  The same analysis was undertaken for point patterns.  The four patterns with largest peaks 24 hours apart are shown in Figure 3.  None of these appears to be particularly unusual.

Figure 3:  Point patterns were large peaks are separated by 24 hours

In summary, point patterns do not appear to have the problems that were identify for areal patterns.  Modellers still need to check for embedded bursts which depend on local values of design rainfall.

Code to reproduce the figures and analysis is available as a gist.

# ARR areal temporal patterns – II

The previous post looked at some areal temporal patterns from the ARR data hub that seemed odd, with a few large peaks.  Matt Scorah pointed out that the peaks were commonly 24 hours apart so could be caused by accumulated rainfall totals in pluvio records.

This raises the question of whether there are other suspect patterns, with large peaks 24 hours apart.  I searched for patterns where the largest and second largest values were greater than 15% and the difference in time between these values was 24 hours.  Most of the patterns in the previous blog showed up but there were a few others.  The two standouts are patterns 5829 and 5732  (Figure 1) which, as before, originate from the Murray Basin in the area between Canberra and Cooma.  These patterns also occur as Central Slopes eventIDs 4049 and 3951.

Figure 1: Areal temporal patterns with large peaks 24 hours apart.

Code for the analysis is available as a gist

# Check ARR temporal patterns before you use them

Recently, I had a task to troubleshoot some odd results from a hydrologic model.  High flood discharges were being produced from very long duration events which was not expected.  The issue came down to a few unusual temporal patterns.

The recommendation in Australian Rainfall and Runoff is that an ensemble of 10 temporal patterns are used for modelling.  These depend on the region, catchment area and duration and are available from the ARR Data Hub.  The temporal pattern regions are shown in Figure 1.

There are two types of patterns;  point and areal.  Areal patterns are appropriate when the catchment area exceeds 75 km2. This blog only considers areal patterns.

Figure 1: Regions used for Temporal Patterns (ARR Book 2, Figure 2.5.7)

An example of an unusual pattern is as follows.  The 10 areal temporal patterns for the southern slopes (mainland) for 120 hours duration for 200 km2 catchment area are shown in Figure 2.  Pattern 7 (event ID 6884) stands out  because of the small number of large values.  The largest peak in the pattern implies that 48.82% of the 120 hour rainfall occurs in a single 180 min (3 hour) increment.  This is likely to cause an intense embedded burst that may produce misleading results from modelling.

Figure 2: Ensemble areal temporal patterns for Southern Slopes (mainland) for 120 hours duration, and catchment area of 200 km2

### Embedded bursts

“Embedded bursts” occur when a period of rainfall within a temporal pattern has an annual exceedance probability rarer than the burst as whole.  Embedded bursts can cause issues with modelling and modellers should consider if they should be removed or smoothed (see Book 2, Section 5.2.1 and 5.9.4; Scorah et al., 2016).  We can check if pattern 7 is likely to lead to such a problem.

The areal temporal patterns in Figure 2 are for the Southern Slopes region which includes Melbourne (Figure 1), so let’s use the Melbourne CBD as an example (lat = -37.8139, lon = 144.9633).  The 1 in 100, 120 hour rainfall for Melbourne is 194 mm (from the BoM).  Applying pattern 7 means there would be a 3 hour period when 48.82% of this rainfall occurs i.e. 94.7 mm in 3 hours.   Looking at the IFD data for Melbourne shows that an event of 94.7 mm in 3 hours has an AEP of between 1 in 500 and 1 in 1000 years.  So the 1 in 100, 120 hour storm contains a very rare 3 hour embedded burst.  If we run a model using this pattern, the results are likely to depend on this one 3 hour period of rain rather than being indicative of the long 120 hour storm what was of interest.

### Where did this pattern come from?

The creation of design areal temporal patterns is explained in ARR Book2, Chapter 5.6.  They are scaled from real storms but if there were insufficient patterns for a particular region then patterns may be donated from neighbouring regions.  For example, the Southern Slopes region can take patterns from the Murray Basin (see ARR Book 2, Table 2.5.10); pattern 7 is an example of this.   From the data hub, it is possible to download a file that contains information about the storms that were used to create the patterns.  For the southern slopes region, this file is Areal_SSmainland_AllStats.csv.   Looking at pattern 7 (Event ID 6884) shows this pattern came from Jerangle, in the Murray Basin, 100 km south of Canberra (lat = -35.8875, lon = 149.3375).   IFD information for this location shows that the 1 in 100, 120 hour storm is 336 mm.  Pattern 7 means that 48.82% of this rain (164 mm) falls in 3 hours.  For the Jerangle region this is much larger than a 1 in 2000 year event (the 1 in 2000 year, 3 hour rainfall is 96.7 mm).  This suggests that pattern 7 is based on a very unusual storm so may not be appropriate for modelling “ordinary risks” such as those associated with 1% events.  Another possibility is that there is an error in the data.

### Searching for other odd patterns

Finding a single strange pattern naturally leads the problem of searching for others.  Whether a rare embedded burst is present depends on both the temporal pattern and the rainfall characteristics of a location but it was not practical to obtain and process pattern and IFD data for the whole of Australia.  Instead a non-uniformity index (NUI) was developed which is a measure of the difference between an ARR temporal pattern and a uniform pattern (which has the same rainfall in each increment).  The details of the NUI are provided below.  A high value of the NUI won’t necessarily indicate an issue with embedded bursts but will identify patterns that require checking.  The NUI was calculated for all 10,692 areal temporal patterns on the data hub.  The top 17 least-uniform patterns are listed in Table 1.

The first 3 entries are actually the same pattern that originated in the Murray Basin and which has been used in the neighbouring regions of Central Slopes and Southern Slopes (mainland).  This next two patterns are the same and also originated in the Murray Basin and there are similar examples throughout the table.  Of the top 17 least-uniform patterns, 15 are from the Murray Basin and all originated in the Jerangle area in 5 separate but nearby locations as mapped on (Figure 3).  The pattern listed in row 14 originated near Charleville, Qld about 600 km west of Brisbane and the final pattern in the table, from the Monsoonal North is based on a storm from Rum Jungle, 100 km south of Darwin, NT.

Table 1: Least uniform areal temporal patterns
No. Event ID Duration (hour) Region NUI Source region Area Lat Lon
1 4207 120 Central Slopes 481 Murray Basin 200 -35.89 149.34
2 5989 120 Murray Basin 481 Murray Basin 200 -35.89 149.34
3 6884 120 Southern Slopes (mainland) 481 Murray Basin 200 -35.89 149.34
4 4119 96 Central Slopes 385 Murray Basin 200 -35.89 149.34
5 5901 96 Murray Basin 385 Murray Basin 200 -35.89 149.34
6 5644 36 Murray Basin 378 Murray Basin 500 -35.91 149.46
7 3950 48 Central Slopes 347 Murray Basin 500 -35.96 149.51
8 5731 48 Murray Basin 347 Murray Basin 500 -35.96 149.51
9 4116 96 Central Slopes 335 Murray Basin 200 -35.94 149.49
10 5898 96 Murray Basin 335 Murray Basin 200 -35.94 149.49
11 6797 96 Southern Slopes (mainland) 335 Murray Basin 200 -35.94 149.49
12 4029 72 Central Slopes 307 Murray Basin 200 -35.91 149.41
13 5810 72 Murray Basin 307 Murray Basin 200 -35.91 149.41
14 4170 96 Central Slopes 252 Central Slopes 10000 -26.39 147.71
15 4043 72 Central Slopes 229 Murray Basin 500 -35.96 149.51
16 5822 72 Murray Basin 229 Murray Basin 500 -35.96 149.51
17 5004 96 Monsoonal North 210 Monsoonal North 100 -13.06 130.96

Figure 3: Map showing the origins of 15 of the 17 patterns in Table 1.  This area is about 100 km south of Canberra, ACT

The patterns listed in Table 1 are plotted as follows. Murray Basin pattern 5989 (row 2) is shown in Figure 2 as pattern 7 (southern slopes eventID 6884).  The next 4 Murray Basin patterns from Table 1 (rows 5, 6, 8 and 10) are shown in Figure 4 below.  All have similar characteristics with one increment producing 40% to 60% of the rain for the whole pattern.  Figure 5 shows the next lot of patterns (rows 13, 14, 16 and 17 in Table 1).  By the time we get down to row 17 (Monsoonal North eventID 5004) the pattern appears closer to expectations.

Figure 4: Non-uniform patterns specified in Table 1 rows 5, 6, 8 and 10

Figure 5: Non-uniform patterns specified in Table 1 rows 13, 14, 16 and 17

At least to me, its not clear why a range of highly non-uniform patterns should all originate in the same area of NSW (Figure 3), but it does suggest the original data should be checked.  In the interim, modellers could consider excluding these patterns and running ensemble or Monte Carlo analyses without them.

I haven’t looked at point temporal patterns but it would seem appropriate to check that they are reasonable and suitable for modelling.

Analysis is available as a gist.

### References

Scorah, M., P. Hill, S. Lang and R. Nathan (2016). Addressing embedded bursts in design storms for flood hydrology 37th Hydrology & Water Resources Symposium. Tasmania, Engineers Australia. (link)

### Appendix: NUI non-uniformity index

A non-uniformity index was used to rank patterns and find the least uniform.  These were likely to contain rare embedded bursts.  The approach was to use the chi-squared probability calculated using a quantile based on the difference between a particular temporal pattern and a uniform pattern. The number of degrees of freedom was the number of increments in the pattern minus one.

The quantile was calculated as:

$Q = \sum \frac{(P_p - P_u)^2}{P_u}$

Where

• $P_p$ is the proportion of rainfall in a increment in the pattern
• $P_u$ is the proportion of rainfall in an increment in a uniform pattern
• The sum is over all the increments in the pattern
• Degrees of freedom (df) is the number of increments less one

From the value of $Q$, and df, find the probability $p$.  I used the pchisq function in R do this.
This probability very close to one for the highly non-uniform patterns so was transformed to be in a range that is easy to work with with maximum values about 500 for the most extreme patterns listed in Table 1.

$\mathrm{NUI} = -log(-log(p))$

# Fitting a probability model to POT data

Australian Rainfall and Runoff 2016 has an example of fitting a probability model to POT (peak over threshold) data (Section 2.8.11 in Book 3).  It took me a long time to understand this example.  To these similarly challenged, I hope this blog will help make it easier.

Calculations are available as an excel spreadsheet; R code is available as a gist. The R code reproduces all the results and figures from this blog.

The case study is the Styx River at Jeogla with the POT series providing 47 peaks above the threshold of 74 m3/s for a period of record of 47 years.  The data are as follows:

878 541 521 513 436 411 405 315 309 301 300 294 283 258 255 255
238 235 221 220 206 196 194 190186 164 150 149 134 129 129 126
119 118 117 117 111 108 105 98 92.2 92.2 91.7 88.6 85.2 79.9 74

Our goal is to fit a probability model to these data and estimate flood quantiles.

You can download the data as a csv file here.  This is the same data set that was used in the 1987 edition of ARR (Book IV, Section 2.9.5).  I’ve sourced the date of each flood from ARR1987 and included it in the file.  Interestingly the earliest flood discharge is from 1943 and the last from 1985, a record of 43 years not 47.  This is because the floods between 1939 and 1943 were all less than the 74 m3/s threshold (from 1939 to 1942  they were: 58.0, 13.0, 22.1, and 26.1 m3/s).

Start by calculating the the plotting positions for these data points.  ARR2016 recommends using the inverse of the Cunnane plotting position formula to calculate recurrence intervals (see ARR 2016 equation 3.2.79, in Book 3, Chapter 2.7.1).

$T = \frac{n+0.2}{R-0.4}\qquad \qquad \mathrm{(1)}$

Where:

• $T$ average interval between exceedances of the given discharge (years)
• $R$ is the rank of flows in the POT series (largest flow is rank 1)
• $n$ is the number of  years of data.

Note that $n$ is the number of years of record, not the number of peaks.

Also,  its important to recognise that equation 1 works differently for the POT series compared to the annual series.  Let’s look at an example that shows this difference.  Consider the case where we were analysing 94 peaks from a POT series from 47 years of data.  In that case, the smallest peak would have a rank of 94.  The recurrence interval would be:

$T = \frac{47+0.2}{94-0.4} = 0.504$

Which is about half a year, or 6 months, which seems reasonable (there were 94 peaks in 47 years so there are 2 peaks per year on average and they will be about 6 months apart on average).

Interpreting the Cunnane plotting position as annual exceedance probability would give a result of:

$P = \frac{94-0.4} {47+0.2}= 1.98$

That is, the probability is greater than 1 which doesn’t make sense.  So we can’t use a plotting position formula to calculate the AEP of data from a peak-over-threshold series.  Instead, the inverse of equation 1 can be interpreted as EY, the expected number of exceedances per year.

The ARR example fits the exponential distribution as the probability model.  This has two parameters, $q_*$ and $\beta$, which can be determined by calculating the first and second L moments of the POT series data.

L1 is just the mean of the pot series (266.36 m3/s).  To calculate L2 we use the formulas developed by QJ Wang (Wang, 1996)

$L2 = \frac{1}{2} \frac{1}{^nC_2} \sum\limits^n_{1=1}\left( ^{i-1}C_1-^{n-i}C_1\right)q_i\qquad \qquad \mathrm{(2)}$

Where:

• $q_i$ are the flows (the POT series) in order from smallest to largest
• $n$ is the number of flows

L2 = 79.12

I’ve calculated L2  using this formula as shown in the companion spreadsheet but it does require a fudge.  Excel doesn’t automatically recognise that 0Cn is equal to zero.  This is a standard mathematical convention that Excel doesn’t seem to implement.

The parameters of the exponential distribution can be expressed in terms of the L moments (See ARR2016, Table 3.2.3.)  We are using the formulas for the Generalized Pareto (GP) but setting $\kappa$ to zero which results in the exponential distribution.  The GP distribution may be useful in real world applications.

For the exponential distribution:

$\beta = 2 \times L2 = 158.24$
$q_* = L1- \beta = 68.12$

The example in ARR doesn’t include confidence intervals for these parameter estimates, but these can be calculated using bootstrapping.

The 95% confidence intervals are,  for beta (37.4, 89.4) and $q_*$ (120.6, 244.7).  The correlation between the parameters is about -0.5.  Uncertainty in parameters is shown graphically in Figure 1.

Figure 1: Uncertainty in parameters.  Ellipse shows 95% confidence limit

The exponential distribution relates the exceedance probability to the size of a flow. In this case, the probability that a peak flow, $q$, exceeds some value $w$ in any POT event is:

$P(q>w) = \mathrm{e}^{\left( -\frac{w-q_*}{\beta} \right)}\qquad \qquad \mathrm{(3)}$
$P(q>w)= \mathrm{e}^{\left( -\frac{w-68.12}{158.24} \right)} \qquad \qquad \mathrm{(4)}$

The ARR example cross-references these formulas to Table 3.2.3, but the reference should be to Table 3.2.1. The final formula in Table 3.2.1 shows the cumulative distribution function for the exponential distribution. This is for the non-exceedance probability. In hydrology we usually use exceedance probabilities (the probability that a flood is greater than a certain value), therefore the required equation is one minus the equation shown.

We can convert the POT probabilities to expected number of exceedances per year, $EY$ by multiplying by the average number of flood peaks above the threshold per year.   The example in ARR states that the threshold is $q_*$ i.e. 68.12 m3/s but the threshold used to create the POT data was 74 m3/s.  I’m not sure, but I think we should use the 74 m3/s threshold.   There is no information available on how many peaks there are above 68.12 m3/s.  It is likely the ambiguity has arise because of a typo in the example i.e. the threshold should not have been related to $q_*$.

Sticking with the 74 m3/s threshold, there are there are 47 values in POT series and 47 years of data,  so the average number of peaks per year is 1.  Therefore, the EY can be expressed as follow:

$EY(w) = \nu P(q>w) =\nu \mathrm{e}^{ \left( -\frac{w-q_*}{\beta} \right)}\qquad \qquad \mathrm{(5)}$

Where latex $\nu$ is the average number of POT floods per year, which is 1 in this case.  In real applications, $\nu$ is likely to be greater than 1.  The discussion in ARR2016 mentions the flow threshold is typically selected so that there are 2 to 3 times more peaks in the POT series compared to the annual series (Book 3, Chapter 2.2.2.2).  Although in Chapter 2.3.4 it is mentioned that there are circumstances where the POT and annual series should have the same number of peaks.

$w$ can also be related to  $EY(w)$ as follows:

$w = q_* - \beta \log \left( \frac{EY(w)}{\nu} \right)\qquad \qquad \mathrm{(6)}$

We can also relate $EY$ to annual exceedance probability by the formula:

$AEP = \frac{\mathrm{e}^{EY}-1}{\mathrm{e}^{EY}}\qquad \qquad \mathrm{(7)}$

Therefore, a flood magnitude, $w$ can be related to an AEP, and an AEP to a flood magnitude, $w$, through the following pair of equations (ARR2016 Equation 3.2.11).

$AEP = 1-\exp \left( -\exp \left(- \frac{w-q_*}{\beta} \right) \right)\qquad \qquad \mathrm{(8)}$

$w = q_* - \beta \left( \log \left( -\log \left( 1-AEP \right) \right) \right)\qquad \qquad \mathrm{(9)}$

These equations can be used to estimate quantiles at standard EY values.  Confidence intervals (95%) have been estimated using bootstrapping (Table 1).

Table 1: Flood quantiles and 95% confidence limits at standard EY values
EY AEP AEP (1 in X) ARI Q Lwr Upr
1.00 0.63 1.6 1.0 68 39 90
0.69 0.50 2.0 1.4 127 106 150
0.50 0.39 2.5 2.0 178 151 215
0.22 0.20 5.1 4.5 308 253 404
0.20 0.18 5.5 5.0 323 267 427
0.11 0.10 9.6 9.1 417 335 565
0.05 0.05 20.5 20.0 542 434 754
0.02 0.02 50.5 50.0 687 548 965
0.01 0.01 100.5 100.0 797 627 1167

Now plot the data and the fit (Figure 2).  The x values are the EY calculated from the inverse of equation 1; the y values are the flows.  The fit is based on equation 6.  Confidence intervals for quantiles were calculated using  bootstrapping.

Figure 2: Partial series for the Styx River at Jeogla showing fit  and confidence intervals

Personally I prefer the graph to be increasing to the right which is the way flood frequency curves are usually plotted, its also the way a similar example was presented in the 1987 edition of Australia Rainfall and Runoff. This just involves using the ARI as the ordinate (Figure 3).

Figure 3: Partial series for the Styx River at Jeogla showing fit  and confidence intervals (same as Figure 2 except increasing to the right)

To see more, check out the excel spreadsheet and R code gist.

I’d also like to thank Matt Scorah for his help in clarifying the process of fitting POT data.

### References

Wang, Q. J. (1996). “Direct sample estimates of L moments.” Water Resources Research 32(12): 3617-3619.

# Time of concentration: comparison of formulas

In the last post, we looked at time of concentration data assembled by Pilgrim and McDermott and the development of their formula.

$t_c = 0.76A^{0.38}$    (hours)                                                     (equation 1)

where

• tc is time of concentration in hours
• A is catchment area in km2.

Here we compare the performance of both the Pilgim McDermott and Bransby Williams formulas with measured values.

A plot of measured and predicted times of concentration using the Pilgrim McDermott method is shown in Figure 1.  This is based on the 95 measurements that they used to  develop their formula so, not surprisingly, the fit looks ok.  The figure also shows a smooth fit through the data (the blue dashed line) which closely tracks the the 1:1 line.

An insightful comment from one of my undergraduate lectures was that  “anything looks good on a log-log plot”, so lets check the fit without a log transformation (Figure 2).  A noticeable feature is the increasing variance with increasing time, and a tendency to under-predict for large times.  The Nash-Sutcliffe coefficient is 0.85 for the log transformed data and 0.70 without transformation.

Figure 1: Comparison of modelled and observed time of concentration values, Pilgrim McDermott formula, log scale

Figure 2: Comparison of modelled and observed time of concentration values, Pilgrim McDermott formula, linear scale

Now let’s check the Bransby Williams fit.  The formula, as recommended in Australian Rainfall and Runoff, 1987 edition is:

$t_c = \frac{58L}{A^{0.1} S^{0.2}}$

Where:

$t_c$ is the time of concentration in minutes
$S$ is the slope (m/km)
$L$ is the main stream length measured to the catchment divide (km)
$A$ is the area of the catchment (km2)

Both average slope and equal area slope are used. ARR 1987 acknowledges that Bransby Williams used the average slope but preferred the equal area slope and noted the reasonable fit to data in French et al. (1974), (although French et al. used the average slope in their analysis).

Figure 3 shows the quality of the fit using both slopes.  There is a tendency for large over-prediction as time of concentration increases.  The Nash Sutcliffe coefficients for average and equal area slope are 0.52 and 0.60 for log transformed data and -2.3 and -4.4 for non-transformed data.

Figure 3: Comparison of modelled and observed time of concentration values, Bransby Williams formula, linear scale

Issues with the Bransby Williams formula was one of the motivations for the work undertaken by Pilgrim and McDermott.  Bransby Williams was also found to be inaccurate in comparisons with 129 catchments in the UK (Beran, 1979).

Based on the available Australian data, the Pilgrim McDermott formula is more accurate and should be used in preference unless there is local data that supports the application of Bransby Williams or some other approach.  Of course, if you are using the Probabilistic Rational Method to estimate flows in ungauged catchments, the Pilgrim McDermott formula should be applied as it was used in the derivation of the runoff coefficients that are mapped in the 1987 edition of Australian Rainfall and Runoff.

Code to produce the figures and analysis is available as a gist.

### References

Beran, M. (1979) The Bransby Williams formula an evaluation.  Procedings of the Institution of Civil Engineers. 66(2):293-299 (https://doi.org/10.1680/iicep.1979.2355).  Also see the discussion (https://doi.org/10.1680/iicep.1980.2512).

Bransby Williams, G. (1922). Flood discharge and the dimensions of spillways in India. The Engineer (London) 121: 321-322. (link)

French, R., Pilgrim, D.H., Laurenson, E.M., 1974. Experimental examination of the rational method for small rural catchments. Civ. Eng. Trans. Inst. Engrs. Aust. CE16, 95–102. CE16 95-102.

McDermott, G. E. and D. H. Pilgrim (1982). Design flood estimation for small catchments in New South Wales, Department of National Development and Energy and Australian Water Resources Council. Research project No. 778/104. Technical paper No. 73.  Australian Government Publishing Service.

Pilgrim, D. H. and G. E. McDermott (1982). Design floods for small rural catchments in Eastern New South Wales. Civil Engineering Transactions CE24: 226-234.

# Time of concentration: some data

I’ve previously written about time of concentration formulas including the approaches developed by Bransby Williams and Pilgrim and McDermott.  Pilgrim and McDermott also collected data on time of concentration from Australian catchments.  This blog explores the data and makes it available as a CSV file.

“Time of concentration” is often assumed to be the time for runoff to flow from the furthest reaches of a catchment to the outlet.  Pilgrim and McDermott take a different tack, and use the “characteristic response time of a catchment” which is the minimum time of rise of the flood hydrograph.  This is the time taken for flow to increase from an initial low value to the hydrograph peak (French et al., 1974).  They determine this characteristic response time from measurements at gauging stations and argue that it is a better approach because it is empirically based (McDermott and Pilgrim, 1982).

Pilgrim and McDermott also measured catchment characteristics: stream length, average slope, equal area slope and catchment area.  They explored the data and ultimately adopted the formula that relates time of concentration to catchment area.

$t_c = 0.76A^{0.38}$    (hours)                                                     (equation 1)

where A is measured in km2.

Characteristic response times were collected for 95 stream gauging sites from Queensland, New South Wales, Victoria and Tasmania.  Those sites where location information is available (62 of 95) are shown Figure 1.  There are lots of sites in the Snowy Mountains and around Sydney and a scattering of sites elsewhere.

Figure 1: Locations of gauges (red dots) used to derive the Pilgrim McDermott formula for the time of concentration

Let’s have a look at the data.  Figure 2 shows the relationship between response time and catchment area.  The regression fit to the logged data is the same as that reported by Pilgrim and McDermott which is a good start.  The fit seems reasonable and the Nash-Sutcliffe coefficient, and the adjusted r-squared, is 0.85.

However, there is substantial uncertainty.  Also plotted on Figure 2 are the 95% prediction intervals which show that a large range of response times are possible for any given catchment size.  For example, the fitted value for a 100 km2 catchment is 4.4 hours but the upper and lower prediction limits are 1.8 hours and 10.5 hours.

Figure 2: Catchment response time as a function of area.  Figure shows the fitted equation (equation 1) and the 95% prediction intervals

The relationship between catchment response time and the other catchment characteristics is interesting (Figure 3 and Figure 4).  There is a strong association with stream length and catchment slope (either equal area or average slope).  Steep, short streams respond quickly so have shorter characteristic times.

Pilgrim and McDermott considered including stream length and slope in the equation to predict time of concentration but found they were highly correlated with area which we can see in figures 5 and 6.  These figures show that large catchments have longer stream lengths, which is expected.  Also, large catchments have lower slopes.  The upshot is that the Pilgrim McDermott formula is taking account of the effect of stream length and slope because of their correlation with area.  Including these variables in the formula for time of concentration only made, at best, a marginal improvement in accuracy.  The adjusted r-squared increased from 0.848 to 0.854 with the inclusion of slope.  There was no advantage from including stream length.  An equation based solely on catchment area is also simpler to use.

Figure 3: Catchment response time as a function of slope

Figure 4: Catchment response time as a function of stream length

Figure 5: Stream length as a function of area

Figure 6: Relationship between catchment area and catchment slope

The data set used by Pilgrim and McDermott is available in a csv file.  Code to reproduce the figures and analysis is available as a gist.

### References

French, R., Pilgrim, D.H., Laurenson, E.M., 1974. Experimental examination of the rational method for small rural catchments. Civ. Eng. Trans. Inst. Engrs. Aust. CE16, 95–102. CE16 95-102.

Grimaldi, S. Petroselli, A., Tauro, F. and Profiri, M. (2012) Time of concentration: a paradox in modern hydrology.  Hydrological Sciences Journal 57(2):217-228. https://www.tandfonline.com/doi/full/10.1080/02626667.2011.644244?scroll=top&needAccess=true

McDermott, G. E. and D. H. Pilgrim (1982). Design flood estimation for small catchments in New South Wales, Department of National Development and Energy and Australian Water Resources Council. Research project No. 778/104. Technical paper No. 73.  Australian Government Publishing Service.

Pilgrim, D. H. and G. E. McDermott (1982). Design floods for small rural catchments in Eastern New South Wales. Civil Engineering Transactions CE24: 226-234.

# Victoria won the hydrologic games, and they may not have cheated

As usual, the hydrologic games were held during the HWRS2018 conference dinner, and this time Victoria won.  Competitors threw a dart to select an initial loss and rainfall temporal pattern which were used as input to a RORB model to simulate a 1% AEP peak flow.  The team with the highest average peak was the winner.

Figure 1: Results of the hydrologic games (Source: A post by Ben Tate on LinkedIn)

Before the games started, I thought one of the small teams would win.  There were only two international competitors and six from Tasmania, but likely over 100 from Victoria.  Extreme results are more likely from small samples so I expected small teams to have both the highest and lowest scores with the Victorian team to be in the middle, near the true mean.

There is a nice discussion of the effect of sample size on variation in the wonderful book Thinking Fast and Slow by Daniel Kahneman.  See the start of Chapter 10 “The law of small numbers”.  Also, in Howard Wainer’s article The Most Dangerous Equation.  Wainer argues that misunderstanding of this effect has caused confusion and wasted effort. For example, the small-school movement was based on the idea that students performed better in smaller compared to larger schools.  In the US, grants from the Gates foundation were used to facilitate the conversion of large schools to several smaller schools.  However, evidence presented by Wainer shows the small-school effect is likely the result of the extra variation when only a small number of students are assessed.  Comparisons of many schools showed the smallest schools achieving both the highest and lowest scores.

Back to the hydrologic games.  How likely is it that the mean of a large sample, the Victorians, would be higher than all the means from a number of smaller samples?  I don’t know how many people participated in the hydrologic games, so let’s guess.  Ben says there were “over 150 players”; I’ve divided these up as shown in Table 1.

Table 1: Teams for the hydrologic games

Now, to simulate the game we draw random numbers from the normal distribution.  The Victorian score will be the mean of 100 random numbers, the NSW score will be mean of 15 random numbers and so on.  Victoria wins if its mean exceeds the mean of all the other teams.

Using this approach, the probably of Victoria winning is estimated to be 0.00830 (1 in 120), based on the mean of 100 sets of 100,000 simulations.  So its less likely than seeing a 1% AEP flood.  If a hydrologist attends 20 hydrologic games during their career, and this game was played every time, the probability that they would see Victoria win at least once is about 15%.

The upshot is that when Victoria won, and provided the games were fair, we witnessed a rare, but not impossible event.

Calculations are available as a gist.