Australian Rainfall and Runoff: Highlights from the Kuczera Symposium

The Kuczera Symposium was held on June 12 and 13 which was a wonderful celebration of George’s contribution to hydrology and water resources.  There was much discussion about Australian Rainfall and Runoff (ARR) on Day 1.  Some of the things that I noted were:

• More examples have been added in the ARR 2019 version of the Urban Book (book 9).
• Losses in urban areas: There is advice associated with the losses available from the ARR data hub that: These losses are only for rural use and are NOT FOR DIRECT USE in urban areas.  It is clear in the ARR text, that despite this advice, these loses may be used for pervious areas within urban catchments (Book 9, Chapter 6.4.2 and Book 5 Chapter 3.5.3).
• There are differences in the terminology used by ARR and the Bureau of Meteorology.  For example, ARR defines rare rainfall as 1 in 10 AEP to 1 in 100 AEP (Book 1, Figure 1.2.1).  The BoM defines rare rainfall as 1 in 100 to 1 in 2000 (See the BoM IFD webpage).
• Rural losses, available from the data hub for the whole of Australia, were estimated from only 38 small catchments.  Therefore local loss estimates should be used where available.
• Some ARR data are of high quality e.g. IFD data from the Bureau, other data are only intended to be used when there is nothing better available e.g. losses from the data hub.
• Research in NSW shows that using median pre-burst rainfall to calculate burst losses to use for modelling, results in over-estimation of these losses and design flood estimates that are too low.  Work has been done in NSW to provide more accurate burst initial losses (Read more here and here).  Similar work has yet to be undertaken in other states.
• The continuing losses from the data hub were shown to be too high in NSW (Read more in the WMA water report).
• There are some rogue temporal patterns in the ensembles provided on the data hub.  Check and if necessary remove problem patterns.  For more detail on patterns see earlier blog posts (here and here).
• When undertaking flood frequency analysis where there are many small floods, it may be better to use the GEV distribution fitted with L moments rather than discarding a large amount of data as suggested by the Grubbs-Beck test.
• More peaked temporal patterns have a clear association with increasing temperatures (Wasko and Sharma, 2015).  To take this into account we may need to weight peaked temporal patterns more highly in ensemble analysis so they are selected more frequently than current approaches where all patterns are selected with the same probability.
• There needs to be more testing of ARR procedures where all the various inputs and methods are applied. The work by WMA water on 165 catchments in NSW is a good start (link to report).
• The management framework to take ARR to the next step is yet to be worked out.  There was discussion around which institution should lead the next phase and who would be involved. This is yet to be resolved.

The accuracy of the RFFE in Victoria

The previous post looked at the accuracy of the RFFE in NSW, here the analysis is extended to Victoria.  Again, the focus is on how well the RFFE predicts the 1% AEP flood compared to at-site flood frequencies.

There isn’t a readily available independent set of flood frequency estimates to use for comparison, so an alternative approach was developed.  The input data for the RFFE includes 186 gauges in Victoria and with nearby NSW gauges that are likely to be included in a region of influence for a Victoria site, the total number of gauges comes to 205.  For these 205 gauges we can obtain the at-site flood frequency estimates for the 1% flood from the RFFE input data.  We can also request the RFFE tool to return 1% AEP estimates at these same sites.

A comparison of the results is shown as a scatter plot in Figure 1.  The points generally lie along the 1:1 line albeit with substantial variation.

Figure 1: Comparison of RFFE estimates: 1:1 line is shown dashed; blue line is a smooth (loess) through the data.

The same data are shown Figure 2 which plots the ratio of each pair of estimates against their average (a Bland-Altman plot).  The approximate 95% limits of agreement are calculated as the overall average of the logs of the ratios +/- two times the standard deviation of the logs of the ratios.  In this case, the limits of agreement are 5.45 and 0.22 which means, on overage, 95% of at-site estimates will be between 5.45 and 0.22 times the  RFFE estimates.  There is also a small under-prediction bias of 10% (the difference between the average of the ratios and a value of 1).  If all the RFFE outputs were multiplied by 1.1, this would be eliminated.

Figure 2: Comparison as a Bland Altman plot, details of outliers are provided in Table 1

Note that this scatter in Figures 1 and 2 is not wholly caused by problems with the input data since we are comparing outputs where inputs are available.  The uncertainty is largely due to the algorithm used to make the estimates of the 1% AEP values.   This also means the uncertainty shown here is underestimate of the total error when predicting 1% AEP values at an ungauged site.

To improve the RFFE we  need to include better input data but also improve the prediction approach. These was substantial research undertaken to develop the method (Rahman, et al., 2015) but further improvements may be possible.

A pointer to possible issues with input data are the outliers shown on the figures, with further details in Table 1.  The errors for the arid zone sites on Mount Hope Creek and Richardson River are particularly large.  Investigating these outliers may improve average performance.  There are also other sites that need to be checked.  For example, Reedy Creek at Wangaratta North (403209) is included in the RFFE inputs as a small rural stream, when in fact it is an anabranch of the much larger Ovens River.   Similarly Tuppal Creek in NSW is included but it is an anabranch of the Murray River.

Station_id Station name Area
(km2)
AEP 1% flood
(RFFE input)
(m3/s)
AEP 1% flood
(RFFE out)
(m3/s)
220002 Stockyard Creek at Rocky Hall (Whitbys) 75 3483 411
223214 Tambo River u/s Smith Ck 32 8 41
235209 Aire River @ Beach Forest 21 2769 110
407236 Mount Hope Ck @ Mitiamo 1629 16600 616
415257 Richardson River @ Donald 1831 319 1780

With the RFFE being recommended by Australian Rainfall and Runoff, and being widely used in Victoria, undertaking work to improve its accuracy is important.

Code for analysis and figures is available as a gist.

References

Rahman, A., Haddad, K. Haque, M., Kuczera, G. Weinmann, E. (2015) Project 5 Regional Flood Methods: Stage 3.  (https://web.archive.org/web/20170505000129/http://arr.ga.gov.au/__data/assets/pdf_file/0004/40468/ARR_Project-5_Stage-3_report.pdf)

On the accuracy of the RFFE in NSW

The Regional Flood Frequency Estimation (RFFE) web app was developed as part of the revision to Australian Rainfall and Runoff.  The app allows the estimation of design flood magnitudes anywhere in Australia using a simple web interface (Rahman et al., 2015).

Recent work in NSW allows an assessment of the accuracy of the RFFE in comparison with flood frequency estimates from stream gauge data.  WMA Water analysed the flow records from 160 gauges using at-site flood frequency analysis which provides an independent dataset to compare with RFFE design flood values (Podger et al., 2019).  Here, I’ve focused on the 1% AEP flood.  Although there is often substantial uncertainty in estimating the magnitude of the 1% flood from  the typically short records available, it is the most widely used design flood value, so it is important to quantify its reliability.

A comparison of the RFFE estimates with at-site values is shown in Figure 1.   The figure shows the paired values compared to the 1:1 line.  The RFFE over-predicts larger values with a smooth line through the data deviating to the right of this line.  The bias (average of the differences FFA – RFFE) is -405 m3/s, the RMSE is 1460 m3/s and the Nash-Sutcliffe coefficient is -1.29. The variance increases as the magnitude of the 1% AEP values increase.

The results look better when log transformed, appearing to lie closer to the 1:1 line (Figure 2) and the Nash-Sutcliffe coefficient is 0.39.  The over-prediction is still apparent above about 1000 m3/s.

Another way of comparing these data is by using a Bland-Altman plot, also known as a Tukey mean-difference plot.  Where the variance increases with the mean, as is the case for this data set, the average of the paired values is plotted on the x-axis and the log of the ratio of the values is plotted on the y-axis (Bland and Altman, 1999) (Figure 3).

Figure 3: Bland Altman plot comparing at-site and RFFE values.  Outliers are highlighted; the y-axis has been log transformed

The bias (average log ratio) is 0.88. If the RFFE estimates where multiple by 0.88 they would, on average, be equal to the at-site values.

The approximate limits of agreement between the datasets can be calculated as the mean of the log ratios +/- two times the standard deviation of the log ratios.  For this data, the limits are 4.53 and 0.17.  This means that on average, 95% of the time, the at-site values fall between 4.53 times the RFFE value, and 0.17 times the RFFE value.

Many estimates will be more accurate than this at individual sites but overall these aren’t great results so it’s worth thinking about how to improve the accuracy of the RFFE.  One way would be to address the outliers as they have a major influence on the average accuracy.  These outliers are labelled on Figure 3 as gauge numbers 210084 and 212040. Removing both outliers changes (improves) the upper and lower limits of agreement to 3.63 and 0.19 but increases the bias to 0.84.   The bias increase because the over-prediction of the RFFE values becomes more apparent if the two outliers are removed.

Some information on the outliers are provided in Table 1.  For Glennies Ck (210084) the RFFE underestimates the 1% AEP at-site value by a factor of 46.  This site is challenging because a large dam was constructed in 1983 immediately upstream which has greatly reduced peak flows.   The RFFE is not intended to provide accurate  peak flow estimates where there are large upstream storages so perhaps this site should not be included.  I don’t know of any particular issues with the other gauge, 212040.

Table 1: Details of outliers shown on Figure 3
Gauge No Site River Catchment area
km-square
RFFE
(1% AEP)
(m3/s)
At-site
(1% AEP)
(m3/s)
210084 The Rocks No. 2 Glennies Ck 227 130 5985
212040 Pomeroy Kialla Ck 96 138 1819

The accuracy of the RFFE would also be improved if there was better input data.  The flood frequency information from the 160 sites studies by WMA Water could be included in the tool, along with information from flood studies where there has been a thorough analysis of flood behaviour and generation of accurate flood frequency information. It is important that we work to improve the RFFE so that it provides the best possible design flood estimates.  Code to undertake the analysis and create the graphs is available as a gist.

References

Bland, J. M. & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 1(8476), 307–310. (Available from (https://web.archive.org/web/20180615083238/https://www-users.york.ac.uk/~mb55/meas/ba.pdf)

Bland, J. M., & Altman, D. G. (1999). Measuring Agreement in Method Comparison Studies — A Review. Statistical Methods in Medical Research, 8, 135–160. https://doi.org/10.1007/0-8176-4422-9_13

Haddad, K., Rahman, A., Weinmann, P. E. (2010). Stream flow data preparation for regional flood frequency analysis : Lessons from southeast Australia. Australian Journal of Water Resources, 14(1)17-32. (link to abstract)

Podger, S., Babister, M., Trim, A., Retallick, M. and Adam, M. (2019) Review of ARR design inputs for NSW.  WMA water for office of environment and heritage. (https://web.archive.org/web/20190608201640/https://www.environment.nsw.gov.au/-/media/OEH/Corporate-Site/Documents/Water/Floodplains/review-australian-rainfall-runoff-design-inputs-nsw.pdf)

Rahman, A., Haddad, K. Haque, M., Kuczera, G. Weinmann, E. (2015) Project 5 Regional Flood Methods: Stage 3.  (https://web.archive.org/web/20170505000129/http://arr.ga.gov.au/__data/assets/pdf_file/0004/40468/ARR_Project-5_Stage-3_report.pdf)

Legacy, rather than adequacy, drives the selection of hydrology models

A comment on:

Addor, N. and Melsen, L. A. (2019) Legacy, rather than adequacy, drives the selection of hydrology models. Water Resources Research 55:378-390.

Lots of interesting things in the paper which reports on use of 7 rainfall runoff models as documented in 1,529 peer-reviewed papers.

Modellers use the model they are familiar with rather than looking to find the model that best fits their problem.  Addor and Melsen found that in 74% of published studies, the chosen model could be predicted based solely on the affiliation of the first author.  Models are usually associated with particular regions where they dominate.   The same model was used for a wide variety of purposes, in various landscapes and at a range of temporal and spatial scales.

The reason models are selected for legacy reason is that “it works” i.e modelling gets done and papers get published in peer reviewed journals.  Over time, hydrologists develop deep understanding of their models and develop ways to become highly productive.  They also come to perceive their models are adequate to tackle the problems at hand.

2.  Model evaluation

The Nash-Sutcliffe Coefficient, although widely used, isn’t necessarily a good measure of model performance.

An example of the weight of legacy in hydrology is the use of the Nash-Sutcliffe Efficiency to evaluate models.  With the advent of automated calibration techniques, models can provide high NSE values in most catchments (emphasis added).  Yet studies relying on more process-oriented metrics, stricter evaluation procedures and multiple data sets have demonstrated that the resulting simulations often involve unrealistic features and lack of robustness.

3.  Are modellers missing opportunities?

Would better results be possible with a different choice of model? Perhaps.

First, there are large costs in setting up and using a model that is not familiar.  Second, there is little guidance on selecting a model for a given purpose or landscape.  Inter-comparisons of models that are applied to a specific task are rare. Third, prior to using a new model, it is not clear what the advantages would be compared business as usual.  The upshot is that there are lots of risks in adopting a new model and rewards are uncertain.

However we are missing out on opportunities to improve models.  Innovations tend to occur within the silo of single models rather than being adopted across the whole modelling field.

Tools, such as RRL (Rainfall Runoff Library) help tackle the problem of using and comparing a range of models.  The RRL provides a modelling framework that facilitates setting up models and then using the same fitting methods and objective functions.

4. Seven widely used conceptual rainfall runoff models

The paper focuses on 7 widely used models:

1. HBV – Hydrologiska Byråns Vattenbalansavdelning model
2. VIC – the Variable Infiltration Capacity model
3. Sacramento – soil moisture accounting model
4. GR4J – Génie Rural 4 model
5. mHM – Mesoscale Hydrological Model
6. TOPMODEL – TOPography based hydrological model
7. PRMS – Precipitation Runoff Modelling System

The regions where these models are used are mapped in the paper.  International graduate students learn about a model where they study and then apply it to local problems on their return.  This has facilitated the spread of models, particularly from the US which is the destination for many international graduate students.

All these models (except mHM) are shown as being used in Australia but no model is dominant.  However there was no mention of  the ‘Australian’ rainfall runoff models AWBM, SimHyd or IHACRES.  A quick search of Informit suggests that AWBM is the most cited model in Australian conferences and journals, followed by SimHyd then Sacramento, GR4J, IHACRES, and TOPMODEL.   HBV and PRMS are not mentioned in Informit but the Addor and Melsen paper suggests there are a few Australian examples of their use.

At present there are 5 models in the RRL: AWBM, Sacramento, SimHyd, SMAR and Tank.  It would be nice to think the capability could be expanded to take in more models, but its hard to see this becoming a priority. There has been little work on the RRL since 2004 during the exciting times, for hydrologists, of the CRCCH.  We are probably stuck with the situation where the model that gets used, largely depends on who gets the modelling job.

ARR point temporal patterns

Previous posts (here and here) have looked at Australian Rainfall and Runoff areal temporal patterns and identified some that appear to be unusual and may lead to problems of embedded bursts. This post reviews point patterns.  Point patterns are suitable for modelling smaller catchments; those with areas less than 75 km2.

Embedded bursts occur when the rainfall associated with a period within a pattern is rarer than the event represented by the pattern as a whole.  Whether embedded bursts will be an issue depends on local values of design rainfall so patterns need to be checked whenever hydrological models are applied to a new location.  These checks are available in many hydrologic modelling programs.

Here I’ve used an approximate method to identify that patterns where there is potential for embedded bursts.  This involves the calculation of a non-uniformity index (NUI) (explained here).

The top ten least uniform point patterns are shown in Table 1 with the top 8 graphed in figures 1 and 2.

Table 1: Least uniform point temporal patterns
EventID Durtation (min) Duration (hr) Time step (min) Region AEP NUI Source Region
1208 8640 144 180 Rangelands (West) frequent 419 Rangelands (West)
1182 10080 168 180 Rangelands (West) intermediate 416 Rangelands (West)
1973 10080 168 180 Rangelands frequent 375 Rangelands
1183 8640 144 180 Rangelands intermediate 357 Rangelands
1181 8640 144 180 Rangelands (West) intermediate 357 Rangelands (West)
2870 10080 168 180 Central Slopes frequent 342 Central Slopes
1083 10080 168 180 Rangelands intermediate 341 Rangelands
1079 10080 168 180 Rangelands (West) intermediate 341 Rangelands (West)

Figure 1: Top 4 least-uniform patterns

Figure 2: Next 4 least uniform patterns

All of these are long patterns, 144 hours or 168 hours and contain extensive periods of little rain with isolated bursts.  Nine of 10 patterns are from rangelands; arid areas where long rainstorms are relatively rare.  Deciding on representative patterns for modelling long duration events in these areas is challenging.

Large peaks 24 hours apart

Analysis of areal patterns showed here were some that had large peaks 24 hours apart which could have been caused by accumulated rainfall in pluviograph records that had not been processed correctly.  The same analysis was undertaken for point patterns.  The four patterns with largest peaks 24 hours apart are shown in Figure 3.  None of these appears to be particularly unusual.

Figure 3:  Point patterns were large peaks are separated by 24 hours

In summary, point patterns do not appear to have the problems that were identify for areal patterns.  Modellers still need to check for embedded bursts which depend on local values of design rainfall.

Code to reproduce the figures and analysis is available as a gist.

ARR areal temporal patterns – II

The previous post looked at some areal temporal patterns from the ARR data hub that seemed odd, with a few large peaks.  Matt Scorah pointed out that the peaks were commonly 24 hours apart so could be caused by accumulated rainfall totals in pluvio records.

This raises the question of whether there are other suspect patterns, with large peaks 24 hours apart.  I searched for patterns where the largest and second largest values were greater than 15% and the difference in time between these values was 24 hours.  Most of the patterns in the previous blog showed up but there were a few others.  The two standouts are patterns 5829 and 5732  (Figure 1) which, as before, originate from the Murray Basin in the area between Canberra and Cooma.  These patterns also occur as Central Slopes eventIDs 4049 and 3951.

Figure 1: Areal temporal patterns with large peaks 24 hours apart.

Code for the analysis is available as a gist

Check ARR temporal patterns before you use them

Recently, I had a task to troubleshoot some odd results from a hydrologic model.  High flood discharges were being produced from very long duration events which was not expected.  The issue came down to a few unusual temporal patterns.

The recommendation in Australian Rainfall and Runoff is that an ensemble of 10 temporal patterns are used for modelling.  These depend on the region, catchment area and duration and are available from the ARR Data Hub.  The temporal pattern regions are shown in Figure 1.

There are two types of patterns;  point and areal.  Areal patterns are appropriate when the catchment area exceeds 75 km2. This blog only considers areal patterns.

Figure 1: Regions used for Temporal Patterns (ARR Book 2, Figure 2.5.7)

An example of an unusual pattern is as follows.  The 10 areal temporal patterns for the southern slopes (mainland) for 120 hours duration for 200 km2 catchment area are shown in Figure 2.  Pattern 7 (event ID 6884) stands out  because of the small number of large values.  The largest peak in the pattern implies that 48.82% of the 120 hour rainfall occurs in a single 180 min (3 hour) increment.  This is likely to cause an intense embedded burst that may produce misleading results from modelling.

Figure 2: Ensemble areal temporal patterns for Southern Slopes (mainland) for 120 hours duration, and catchment area of 200 km2

Embedded bursts

“Embedded bursts” occur when a period of rainfall within a temporal pattern has an annual exceedance probability rarer than the burst as whole.  Embedded bursts can cause issues with modelling and modellers should consider if they should be removed or smoothed (see Book 2, Section 5.2.1 and 5.9.4; Scorah et al., 2016).  We can check if pattern 7 is likely to lead to such a problem.

The areal temporal patterns in Figure 2 are for the Southern Slopes region which includes Melbourne (Figure 1), so let’s use the Melbourne CBD as an example (lat = -37.8139, lon = 144.9633).  The 1 in 100, 120 hour rainfall for Melbourne is 194 mm (from the BoM).  Applying pattern 7 means there would be a 3 hour period when 48.82% of this rainfall occurs i.e. 94.7 mm in 3 hours.   Looking at the IFD data for Melbourne shows that an event of 94.7 mm in 3 hours has an AEP of between 1 in 500 and 1 in 1000 years.  So the 1 in 100, 120 hour storm contains a very rare 3 hour embedded burst.  If we run a model using this pattern, the results are likely to depend on this one 3 hour period of rain rather than being indicative of the long 120 hour storm what was of interest.

Where did this pattern come from?

The creation of design areal temporal patterns is explained in ARR Book2, Chapter 5.6.  They are scaled from real storms but if there were insufficient patterns for a particular region then patterns may be donated from neighbouring regions.  For example, the Southern Slopes region can take patterns from the Murray Basin (see ARR Book 2, Table 2.5.10); pattern 7 is an example of this.   From the data hub, it is possible to download a file that contains information about the storms that were used to create the patterns.  For the southern slopes region, this file is Areal_SSmainland_AllStats.csv.   Looking at pattern 7 (Event ID 6884) shows this pattern came from Jerangle, in the Murray Basin, 100 km south of Canberra (lat = -35.8875, lon = 149.3375).   IFD information for this location shows that the 1 in 100, 120 hour storm is 336 mm.  Pattern 7 means that 48.82% of this rain (164 mm) falls in 3 hours.  For the Jerangle region this is much larger than a 1 in 2000 year event (the 1 in 2000 year, 3 hour rainfall is 96.7 mm).  This suggests that pattern 7 is based on a very unusual storm so may not be appropriate for modelling “ordinary risks” such as those associated with 1% events.  Another possibility is that there is an error in the data.

Searching for other odd patterns

Finding a single strange pattern naturally leads the problem of searching for others.  Whether a rare embedded burst is present depends on both the temporal pattern and the rainfall characteristics of a location but it was not practical to obtain and process pattern and IFD data for the whole of Australia.  Instead a non-uniformity index (NUI) was developed which is a measure of the difference between an ARR temporal pattern and a uniform pattern (which has the same rainfall in each increment).  The details of the NUI are provided below.  A high value of the NUI won’t necessarily indicate an issue with embedded bursts but will identify patterns that require checking.  The NUI was calculated for all 10,692 areal temporal patterns on the data hub.  The top 17 least-uniform patterns are listed in Table 1.

The first 3 entries are actually the same pattern that originated in the Murray Basin and which has been used in the neighbouring regions of Central Slopes and Southern Slopes (mainland).  This next two patterns are the same and also originated in the Murray Basin and there are similar examples throughout the table.  Of the top 17 least-uniform patterns, 15 are from the Murray Basin and all originated in the Jerangle area in 5 separate but nearby locations as mapped on (Figure 3).  The pattern listed in row 14 originated near Charleville, Qld about 600 km west of Brisbane and the final pattern in the table, from the Monsoonal North is based on a storm from Rum Jungle, 100 km south of Darwin, NT.

Table 1: Least uniform areal temporal patterns
No. Event ID Duration (hour) Region NUI Source region Area Lat Lon
1 4207 120 Central Slopes 481 Murray Basin 200 -35.89 149.34
2 5989 120 Murray Basin 481 Murray Basin 200 -35.89 149.34
3 6884 120 Southern Slopes (mainland) 481 Murray Basin 200 -35.89 149.34
4 4119 96 Central Slopes 385 Murray Basin 200 -35.89 149.34
5 5901 96 Murray Basin 385 Murray Basin 200 -35.89 149.34
6 5644 36 Murray Basin 378 Murray Basin 500 -35.91 149.46
7 3950 48 Central Slopes 347 Murray Basin 500 -35.96 149.51
8 5731 48 Murray Basin 347 Murray Basin 500 -35.96 149.51
9 4116 96 Central Slopes 335 Murray Basin 200 -35.94 149.49
10 5898 96 Murray Basin 335 Murray Basin 200 -35.94 149.49
11 6797 96 Southern Slopes (mainland) 335 Murray Basin 200 -35.94 149.49
12 4029 72 Central Slopes 307 Murray Basin 200 -35.91 149.41
13 5810 72 Murray Basin 307 Murray Basin 200 -35.91 149.41
14 4170 96 Central Slopes 252 Central Slopes 10000 -26.39 147.71
15 4043 72 Central Slopes 229 Murray Basin 500 -35.96 149.51
16 5822 72 Murray Basin 229 Murray Basin 500 -35.96 149.51
17 5004 96 Monsoonal North 210 Monsoonal North 100 -13.06 130.96

Figure 3: Map showing the origins of 15 of the 17 patterns in Table 1.  This area is about 100 km south of Canberra, ACT

The patterns listed in Table 1 are plotted as follows. Murray Basin pattern 5989 (row 2) is shown in Figure 2 as pattern 7 (southern slopes eventID 6884).  The next 4 Murray Basin patterns from Table 1 (rows 5, 6, 8 and 10) are shown in Figure 4 below.  All have similar characteristics with one increment producing 40% to 60% of the rain for the whole pattern.  Figure 5 shows the next lot of patterns (rows 13, 14, 16 and 17 in Table 1).  By the time we get down to row 17 (Monsoonal North eventID 5004) the pattern appears closer to expectations.

Figure 4: Non-uniform patterns specified in Table 1 rows 5, 6, 8 and 10

Figure 5: Non-uniform patterns specified in Table 1 rows 13, 14, 16 and 17

At least to me, its not clear why a range of highly non-uniform patterns should all originate in the same area of NSW (Figure 3), but it does suggest the original data should be checked.  In the interim, modellers could consider excluding these patterns and running ensemble or Monte Carlo analyses without them.

I haven’t looked at point temporal patterns but it would seem appropriate to check that they are reasonable and suitable for modelling.

Analysis is available as a gist.

References

Scorah, M., P. Hill, S. Lang and R. Nathan (2016). Addressing embedded bursts in design storms for flood hydrology 37th Hydrology & Water Resources Symposium. Tasmania, Engineers Australia. (link)

Appendix: NUI non-uniformity index

A non-uniformity index was used to rank patterns and find the least uniform.  These were likely to contain rare embedded bursts.  The approach was to use the chi-squared probability calculated using a quantile based on the difference between a particular temporal pattern and a uniform pattern. The number of degrees of freedom was the number of increments in the pattern minus one.

The quantile was calculated as:

$Q = \sum \frac{(P_p - P_u)^2}{P_u}$

Where

• $P_p$ is the proportion of rainfall in a increment in the pattern
• $P_u$ is the proportion of rainfall in an increment in a uniform pattern
• The sum is over all the increments in the pattern
• Degrees of freedom (df) is the number of increments less one

From the value of $Q$, and df, find the probability $p$.  I used the pchisq function in R do this.
This probability very close to one for the highly non-uniform patterns so was transformed to be in a range that is easy to work with with maximum values about 500 for the most extreme patterns listed in Table 1.

$\mathrm{NUI} = -log(-log(p))$