Category Archives: R

Victoria won the hydrologic games, and they may not have cheated

As usual, the hydrologic games were held during the HWRS2018 conference dinner, and this time Victoria won.  Competitors threw a dart to select an initial loss and rainfall temporal pattern which were used as input to a RORB model to simulate a 1% AEP peak flow.  The team with the highest average peak was the winner.


Figure 1: Results of the hydrologic games (Source: A post by Ben Tate on LinkedIn)

Before the games started, I thought one of the small teams would win.  There were only two international competitors and six from Tasmania, but likely over 100 from Victoria.  Extreme results are more likely from small samples so I expected small teams to have both the highest and lowest scores with the Victorian team to be in the middle, near the true mean.

There is a nice discussion of the effect of sample size on variation in the wonderful book Thinking Fast and Slow by Daniel Kahneman.  See the start of Chapter 10 “The law of small numbers”.  Also, in Howard Wainer’s article The Most Dangerous Equation.  Wainer argues that misunderstanding of this effect has caused confusion and wasted effort. For example, the small-school movement was based on the idea that students performed better in smaller compared to larger schools.  In the US, grants from the Gates foundation were used to facilitate the conversion of large schools to several smaller schools.  However, evidence presented by Wainer shows the small-school effect is likely the result of the extra variation when only a small number of students are assessed.  Comparisons of many schools showed the smallest schools achieving both the highest and lowest scores.

Back to the hydrologic games.  How likely is it that the mean of a large sample, the Victorians, would be higher than all the means from a number of smaller samples?  I don’t know how many people participated in the hydrologic games, so let’s guess.  Ben says there were “over 150 players”; I’ve divided these up as shown in Table 1.


Table 1: Teams for the hydrologic games


Now, to simulate the game we draw random numbers from the normal distribution.  The Victorian score will be the mean of 100 random numbers, the NSW score will be mean of 15 random numbers and so on.  Victoria wins if its mean exceeds the mean of all the other teams.

Using this approach, the probably of Victoria winning is estimated to be 0.00830 (1 in 120), based on the mean of 100 sets of 100,000 simulations.  So its less likely than seeing a 1% AEP flood.  If a hydrologist attends 20 hydrologic games during their career, and this game was played every time, the probability that they would see Victoria win at least once is about 15%.

The upshot is that when Victoria won, and provided the games were fair, we witnessed a rare, but not impossible event.

Calculations are available as a gist.



Flow duration curves

Flow duration curves show the percentage of time in a flow record, that flow exceeds a particular value.   They are a long-standing tool of hydrological analysis and their construction and uses are thoroughly documented in two papers by Vogel and Fennessey (1994: 1995).    Flow duration curves can be based on daily, monthly or annual flows and constructed for periods of interest such as a whole year, seasons, high or low flow periods, or, for example, during a period of fish migration. They may use the whole period of record, or parts of the record, to analyse variability.

Basic flow duration curves are easy to create but more complicated versions are sometimes useful.  An example for the Broken River at Casey Weir (gauge 404216) is shown in Figure 1.  Here the annual flow duration curves are shown in grey for each year of record from 1973 to 2006.  The y-axis is log-transformed and the x-axis uses a probit scale. Wet years (1974, 1993) and a dry year (1976) are coloured, highlighting the substantial variation in annual flow.  In the driest year there was no flow for 50% of days.  The average flow duration curve for all years is shown in black.


Figure 1: Flow duration curves for the Broken River at Casey Weir (404216) (1973-2016)

Code to produce this figures is available as a gist.


Vogel, R. M. and N. Fennessey (1994) Flow-duration curves. I: New interpretation and confidence intervals. Journal of Water Resources Planning and Management-ASCE 120(4): 485-504. (link to abstract) (link to paper)

Vogel, R. M. and N. M. Fennessey (1995) Flow duration curves II: a review of applications in water resources planning. Water Resources Bulletin 31(6): 1029-1039. (link)



Comparing peaks at stream gauges

In hydrology, it is often useful to compare peak flow values at neighbouring streamflow gauges.  This is important for data checking and to see if it is feasible to use data from one gauge to infill missing values in another.  In reports and student projects, I’ve seen the comparison as a time-series of columns (Figure 1).  This figure shows annual peaks between 1980 and 1990 for two streams in northeast Victoria, near Wangaratta, Boggy Ck and Fifteen Mile Creek.

It is usually better to plot the data as paired values (Figure 2).  This allows a more direct comparison of the peaks and reveals the potential for establishing a useful relationship between flows at the two gauges.  On Figure 2, the value for 1981 is an outlier.  The flow is much larger and ratio of flow between the two gauges looks different to the other years.  A check of the flow record shows that the 1981 peak for Boggy Ck has quality code 104, “Records manually estimated”.  To increase confidence in the magnitude of this large peak, we could investigate further.  For example, by checking the high flow rating,  looking at the flow at other neighbouring gauges, and assessing the accuracy of the manual estimation method.


Figure 1: Comparison of annual peak flows as a time-series of columns



Figure 2: Comparison of annual peak flows as paired values (same data as shown in Figure 1)

Code to produce these graph is available as a gist.

Comparing rainfall measurements with a heat map

How to compare rainfall measured at two gauges?  One method I came across in a report was to make a table that showed the difference in totals for each month of each year in an overlapping period.  An example is shown in Table 1 below.   This provides some information but its not easy to see what’s going on.


Table 1.  Comparison of rainfall at two gauges

It is straightforward to turn a table of values like this, into a heat map (Figure 1), where its easy to see patterns.  The July column is mostly green (one gauge gets more rain in the austral winter) the January column is mostly purple (summers are drier at one site).


Figure 1.  Data from Table 1 as a heat map

If its important to look at the individual values, these can be overlaid on the coloured tiles in light grey so the patterns are not obscured.


Code to produce these figures is available as a gist.


Visualising Hydrologic Data

This is the companion webpage for my paper at the 2018 Hydrology and Water Resources Symposium, Visualising Hydrologic Data.

How to make figures from the paper and presentation



Graph catalogues (catalogs)



Some interesting papers and books

  • Wainer, H., Friendly, M. and Millan P. (2015) Graphs ‘R Us’: A discussion of Anthony Unwins Graphical Data Analysis with R.  Journal of Educational and Behavioral Statistics 40(6):665-670. (copy at research gate).  This paper considers the relative merits of 16 graphics books.
  • Weissgerber et al., (2015) Beyond bar and line graphs: time for a new data presentation paradigm. PloS Biol 13(4):e1002128
  • Nathan Yau’s books
  • Edward Tufte’s books



TQmean, a measure of the impact of urbanisation on flow

Urban development has a profound impact on flow and hydrologic indicators have been proposed to highlight the changes as suburbs spread over a catchment, increasing impermeable area.  A commonly used measure of impact is TQmean– the proportion of time that flow in each year is greater than mean flow for that year.  This decreases with urbanisation, and has been shown to be linked to ecological condition of a stream (Booth et al., 2004).
As an example, consider two neighbouring streams in eastern Melbourne: Brushy Creek which flows through the suburb of Croydon, with a catchment that is 28% impervious, and Olinda Creek with a catchment that is mainly forested and which is 5% impervious.
Often, the value of TQmean is calculated and then averaged for several years. Using this approach, The TQmean for Brushy Creek, is 0.21, with the less urbanised Olinda Creek, having a value of 0.37 (for the period 1988 to 2016). Using the relationships in Booth et al. (2004) this suggests Olinda Creek would be predicted to have ‘good’ biological condition, while Brushy Creek would be predicted to be ‘very poor’.

Calculating a single value of TQmean is instructive but the temporal distribution of the time of the year when flows exceed the mean is also altered by urbanisation. Using TQmean as the metric, urbanisation results in high flows occurring more often, but for shorter periods, that are dispersed throughout the year. Comparing the plots of Olinda Creek and Brushy Creek in  (figure), higher flows (flows above the mean), are clustered in the winter (June to August) for Olinda Creek. There is more winter runoff because the Olinda Creek catchment wets up owing to the higher rainfall, and reduced evaporation that occur seasonally in this area. For Brushy Creek, with the same climate, short bursts of high flow occur throughout the whole year. This can be attributed to runoff from impervious surfaces which will occur anytime there is rain.



Figure 1:  Periods of flow above and below the mean flow for (A) Olinda Ck and (B) Brushy Ck

A key issue revealed by this analysis is the changed seasonality of high flow which is a result of urbanisation. This is just one of the many changes in flow regime caused by urbanisation that leads to poor stream condition (Burns et al. 2014).

Code to plot these graphs is available as a gist.


Graphing a long flow series

A long series of flows can be challenging to show graphically without squeezing the data so much that all the useful information is lost (Figure 1).  Two approaches are shown here.  First, a ‘cut-and-stack’ plot, which takes a long graph and cuts it into segments equal to the width of a page.  These segments are stacked on top of each other, stretching out the x-axis (Figure 2).  The figure shows the flows for each decade of the ~ 50 years of data for the Broken River at Caseys Weir (Gauge 404216).

An alternative is a trellis or facet plot (Figure 3).  Here, the flow in each year is plotted as a separate graph.  If the y-axis scale is held constant across all years, the overall temporal variation is highlighted and the very dry years stand out (for example 2006-2009).

If the scales are varied for each year the seasonal flow patterns are emphasised (Figure 4).  The transfers from upstreams dams standout with the rectangular hydrographs in 1977, 1982, 1983 and the summer of 1983-84.


Figure 1: Broken River at Caseys Weir (404216) 31 March 1972 to 19 April 2017


Figure 2: Cut-and-stack plot of the mean daily flow for the Broken River at Caseys Weir (flow data is the same as shown in Figure 1)


Figure 3: Facet plot of the mean daily flow for the Broken River at Caseys Weir.  Flow data is the same as shown in Figure 1; y-axis scaling is held contant


Figure 4: Same as Figure 3 except that y-axis scaling varies between years


Figure 5: Broken River at Caseys Weir (20 Mar 2017)

Data for gauge 404216 was obtained from the Victorian Water Measurement Information System (WMIS).

R code to produce the graphs in this blog is available as gists (here for the cut-and-stack plot; and here for the facet plots).