# Statistical significance

There has been a long-term debate about statistical significance testing and the use of p-values. In 2019, The American Statistician published a special issue with an editorial: ‘Moving to a World Beyond “p < 0.05”‘.  Hydrologists are generally dilettantes in the statistical world but there are some important lessons for us.

The single key message from the editorial (Wasserstein et al., 2019) is:

“Statistically significant” – don’t say it and don’t use it.

Some don’ts:

• Don’t base your conclusion on whether an association or effect was found to be statistically significant
• Don’t believe that an association or effect exists just because it was statistically significant
• Don’t believe an association or effect is absent just because it was not statistically significant
• Don’t conclude anything about the scientific or practical importance of a finding based solely on its statistical significance (or lack therefore).

Instead, the recommendation is: “Accept uncertainty. Be Thoughtful, Open and Modest” (ATOM).

Accepting uncertainty involves accompanying every point estimate by a measure of its uncertainty.  If we recognise and publicise uncertainty it will provide the motivation to do things better.  Embracing uncertainty will highlight the areas where more work is required.  It also helps us to be modest.  Just think of the huge uncertainty in many of the estimates we use in hydrology: losses, flows from extrapolated ratings, 1% flood levels.

Being thoughtful is about carefully planning studies, designing them well and executing them carefully. It’s also understanding the practical use and application of results and considering multiple approaches for solving problems.  It’s being clear about the level of confidence users should have in results.

Openness in reporting is important.  There is a minimum amount of information that should be provided so that others can understand what was done (see the previous post for further discussion on this).  In the scientific community there is a move to providing sufficient information so that others can undertake an alternative analysis of data.

Being modest means understanding and clearly expressing the limitations of your work and not using unwarranted certainty.

Accepting uncertainty, being thoughtful, open and modest is clearly a worthwhile goal, not just for statisticians, but also for hydrologists.

Reference

Wasserstein, R. L., Schirm, A. L. and Lazer, N. A. (2019) Moving to a World Beyond “p < 0.05”.  The American Statistician, 73: sup1, 1-19, DOI:10.1080/00031305.2019.1583913

# Recommended Practice for Hydrologic Investigations and Reporting

A comment on:

R. J. Nathan, and T. A. McMahon (2017) Recommended practice for hydrologic investigations and reporting.  Australasian Journal of Water Resources 21(1):3-19

This is a very useful paper which sets out a road map of what should be included in hydrologic reports.

A few highlights.

Data

When using rainfall, evaporation and streamflow data, include:

• gauge name and number (for flow data also include the stream name)
• location (latitude and longitude)
• period of record
• amount of missing data
• method used to impute missing data
• (for rainfall data) the method used to check and adjust accumulated values.

The quality of rainfall and evaporation records can be checked using double-mass curve analysis.

For streamflow data, the quality of the rating is important so there needs to be information on:

• number of gaugings used to define the rating curve
• a comparison of the maximum gauged flow to the maximum recorded flow.

Highflow ratings can be checked and adjusted using hydraulic modelling (Tate and Russell, 2014).

My personal observation is that there is often more data available than can be found on the standard websites.  It may be worth chasing older published compilations of data.

Objective functions for modelling

The objective function used to find optimal parameter values for modelling will depend on which aspect of the model results are most important to get right.  Using the standard Nash-Sutcliffe approach may be fine for high flows but may not provide a good fit for low flows.

Model validation

“Whatever length of data is available, an optimisation technique needs to be applied that allows validation of the model parameters using ‘independent’ data.”

Flood frequency analysis

There is good advice on flood frequency analysis in Australian Rainfall and Runoff.  One sage piece of advice by Nathan and McMahon relates to the inclusion of historic events (events that predate the gauging record).  These can be included when fitting probability models (see ARR, Book 3, Section 2.8.4).

“[the historic period] is commonly assumed to commence in the year that the earliest historic event occurred in, but it may be more appropriate to base this on the year in which reliable anecdotal evidence (in the form of newspaper or other extant records) can be assumed to be available.”

Use of regional information

We are fortunate in Australia to have the Regional Flood Frequency Analysis (RFFA) tool which provides flood frequency estimates over the populated areas of Australia. Sometimes the relationships between flood size and site characteristics provided by the RFFA, may be useful to transpose other flow indices.

Documentation

Nathan and McMahon provide a comprehensive list of what needs to be provided in a report.  The items that stood out for me were:

• A map that shows the area to be modelled, waterways, catchments, sub-catchments,  and location of rainfall, evaporation and flow monitoring stations
• Tables and charts that show the availability of data
• Information on ratings for flow gauges
• Clear description of what data were used and an explanation of any data that were excluded
• Description of the data used in calibration and validation
• Statistics and graphical descriptions of model performance
• Values of all model parameters and how they were obtained
• Details of flood maxima used at the key sites of interest including information on:
• sources of data
• dates of occurrence of floods
• extent of extrapolation of the rating curve
• outliers and how they were handled
• historical data
• Details of flood frequency analysis:
• choice of probability model
• method of fitting
• the use of prior information for parameters
• confidence limits for the fitted flood frequency distribution
• All acronyms should be defined (and located in one place in the report)
• It is not appropriate to leave key details out of reporting by referring to other reports that the reviewer can’t access.

Nathan and McMahon conclude that the issues in most need of additional consideration are:

• Provision of more detail around the provenance and preparation of datasets
• Increased rigour regarding the calibration and validation of models
• The use of regional information and independent methods to derive ‘best estimates’ based on consideration of the relative sources of uncertainty
• Assessment of salient uncertainties and their impact on the conclusion drawn.

Reporting needs to include enough detail to satisfy a technical reviewer.  It is fine for reports to be aimed at a general audience provided there is sufficient technical detail in appendices.

References

Tate, B., & Russell, K. (2014). Improving rating curves with 2D hydrodynamic modelling. Hydrology and Water Resources Symposium, pp. 873–880. Perth : Engineers Australia.
https://search.informit.com.au/documentSummary;dn=387128427466229;res=IELENG

R. J. Nathan, and T. A. McMahon (2017) Recommended practice for hydrologic investigations and reporting.  Australasian Journal of Water Resources 21(1):3-19

Guidelines

Barma, D. and I. Varley (2012) Hydrological modelling Practices for Estimating Low Flows – Guidelines.  Lows Flows Report Series.  Canberra: National Water Commission

Hydrologic modelling practice notes https://www.mdba.gov.au/publications/mdba-reports/hydrologic-modelling-practice-notes

Beven, K. and P. Young (2013) A guide to good practice in modelling semantics for authors and referees.  Water Resources Research 49(8):5092-5098.

Vaze, J., P. Jordan, R. Beecham, A. Frost and G Summerell (2012) Guidelines for Rainfall-runoff modelling – towards best practice model application eWater Cooperative Research Centre.

Jakeman, A. J., Letcher, R. A. and Norton, J. P. (2006) Ten iterative steps in development and evaluation of Enviornmental models.  Environmental modelling and software. 21(5):602-614

# The value of a word

This is recreational mathematics rather than hydrology…

I received an email recently about the value of hardwork, knowledge and attitude.  If a, b, c,…,z are represented by the numbers 1, 2, 3,…, 26, then hardwork adds to 98, knowledge to 96 and attitude to 100.

“While hard work and knowledge will get you close, attitude will get you there!”

What about the value of other words?  Here’s an R function that will calculate the value of a word or phrase.

Wordsum = function(my_word){
x = setNames(1:26, letters)
x1 = unlist(strsplit(tolower(my_word), '?'))
sum(x[x1], na.rm = TRUE)
}  

Lets try it.

Wordsum('hydrology')
#129

Wordsum('hydraulics')
#120


Clearly hydrology is more auspicious, but neither compare to Geomorphology (171).

# 1% flood: binomial distribution, conditional probabilities

I previously wrote about considering the occurrence of 1% floods as a binomial distribution, this post extends that analysis to look at conditional probabilities.  Some of the results are counter intuitive, at least to me, in that the risk of multiple 1% floods is larger than I would have guessed.

The probability of a 1% (1 in 100) annual exceedance probability (AEP) flood occurring in any year is 1%.  This can be treated as the probability of a “success”  in the binomial distribution, with the number of trials being the number of years. So the probability of having exactly one 1% flood in 100 years is

${100\choose 1}0.01^{1}\left( 1-0.01\right) ^{99} = 0.37$

In R this can be calculated as dbinom(x = 1, size = 100, prob = 0.01) or in excel =BINOM.DIST(1,100, 0.01, FALSE).

The cumulative distribution function of the binomial distribution is also useful for flood calculations.

What is the probability of 2 or more 1% floods in 100 years:

R: pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE) = 0.264

Excel: =1 - BINOM.DIST(1,100, 0.01, TRUE) = 0.264

We can check this by calculating the probability of zero or one flood in 100 years and subtracting that value from 1.

1 - (dbinom(x = 1, size = 100, prob = 0.01) + dbinom(x = 0, size = 100, prob = 0.01)) = 0.264

We can also do conditional probability calculations which could be useful for risk assessment scenarios.

What is the probability that exactly two 1% floods occur in 100 years given that at least one occurs?

$\Pr{(X = 2\mid X \ge 1)}$ =
dbinom(x = 2, size = 100, prob = 0.01)/pbinom(q = 0, size = 100, prob = 0.01, lower.tail = FALSE) = 0.291

What is the probability that at least two 1% floods occur in 100 years given that at least one occurs?
$\Pr{(X \ge 2\mid X \ge 1)}$ =
pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE)/pbinom(q = 0, size = 100, prob = 0.01, lower.tail = FALSE) = 0.416

We can also check this by simulation.  This code generates the number of 1% floods in each of 100,000 100-year sequences.  We can then count the number of interest.

set.seed(1969) # use a random number seed so the analysis can be repeated if necessary
floods = rbinom(100000,100, 0.01) # generate the number of 1% floods in each of 100,000, 100-year sequences

floods_subset = floods[floods >= 1] # Subset of sequences that have 1 or more floods
# Number of times there are two or more floods in the subset of 1 or more floods

sum(floods_subset >= 2) / length(floods_subset)
# 0.4167966

# or
sum(floods >= 2)/sum(floods >= 1)

#[1] 0.4167966



A slightly tricker situation is a question like: What is the probability of three or fewer floods in 100-years given there is more than one.

$\Pr{(X \le 3\mid X > 1)} = \Pr(X \le 3 \cap X > 1 )/\Pr( X > 1)$

floods_subset = floods[floods > 1] # Subset of sequences that have more than one flood

# Number of times there are three or fewer floods in the subset of more than one flood

sum(floods_subset ≤ 3) / length(floods_subset)
#[1] 0.9310957

# Or, for the exact value

# (Probability that X = 3 + Probability that X = 2)/(Probability that X > 1)
(dbinom(x = 3, size = 100, prob = 0.01) + dbinom(x = 2, size = 100, prob = 0.01))/ pbinom(q = 1, size = 100, prob = 0.01, lower.tail = FALSE)
#[1] 0.9304641



The probability of experiencing at least one 1% flood in 100-years is $1 - (1-0.01)^{100}$ = 0.634.  How many years would we need to wait to have a 99% chance of experiencing a 1% flood?

$0.99 = 1-(1-0.1)^n$

$n=\frac{log(0.01)}{log(0.99)} = 458.2$.  The next largest integer is 459.

We can also solve this numerically.  In R the formula is 0.99 = pbinom(q=0, size = n, prob = 0.01), solve for n. Using the uniroot function gives n = 459 years (see below).

So all these areas subject to a 1% flood risk will flood eventually, but it may take a while.

f = function(n) {
n = as.integer(n) #n must be an integer
0.99 - pbinom(q = 0, size = n, prob = 0.01, lower.tail = FALSE)
}

# \$root
# [1] 458.4999

uniroot(f, lower = 100, upper = 1000)

pbinom(q = 0, size = 459, prob = 0.01, lower.tail = FALSE)
# [1] 0.990079



How many years before there is a >99% chance of experiencing more than one flood? This is one minus (the probability of zero floods + the probability of one flood).

Let the number of years equal n.

$1-((1-0.01)^n + n(0.01)(1-0.01)^{n-1}) = 0.99$. Solving for n gives 662 years

# Using a Treemap to show gauged areas

According to Wikipedia a Treemap is a method for displaying hierarchical data using nested rectangles.  Its easier to explain with an example which also shows how Treemaps are useful in hydrology.

In this post we’ll use a Treemap to provide information on the catchment of Western Port, a bay to the east of Melbourne; in particular, the proportion of the catchment that is gauged.  By the way, according to the Vicnames database, the official name is Western Port, not Western Port Bay.

There are 4 main gauged tributaries draining to Western Port:

• Cardina Creek
• Bunyip River
• Lang Lang River
• Bass River

Figure 1 shows the gauged portions of these catchments overlaid on a shaded relief base map.

Figure 1: Gauged catchments draining to Western Port

We can also plot the the gauged and ungaged portions of these catchments on a treemap along with the remainder of the Western Port Catchment (Figure 2).  Or just the gauged portions compared to the rest (Figure 3).

Figure 2: Gauged and ungauged portions of the 4 catchments

Figure 3: Gauged portions of the four catchments compared with the rest of the Western Port Catchments

If we are going to estimate flow or pollution loads to Western Port than we need to factor up the results from the gauges.  A treemap shows how much extrapolation is necessary.  Code to produce these treemaps is available as a gist.

# How do you spell Eumemmerring?

Edit 10 Sep 2019.  The WMIS has been updated so the spelling of Eumemmerring is now correct.

There is a waterway in Melbourne’s South East called Eumemmerring Ck (note the spelling, the first ‘m’ is single, second ‘m’ doubled and the ‘r’ is doubled).  This is the official spelling from the Vicnames website.  On the Victorian water monitoring site its “Eummemering” (first ‘m’ doubled, single ‘r’) as in the stream gauges:

On the Bureau of Meteorology Water Data Online its Eumemmering (single first ‘m’, double second ‘m’, single ‘r’:

• 228203C Eumemmering Creek at Dandenong South
• 228235A Eumemmering Creek at Narre Warren North

Looking at the numbers of hits on a google search:

• Eumemmerring, the correct spelling (299,000 results)
• Eummemering, the spelling on the WMIS website (663 results)
• Eumemmering, the spelling on the Bureau of Meteorology’s water data online (9610) results.

So the correct spelling is winning on the internet but there is significant support for the BOM spelling.

A map, from the Argus 30 Oct 1888 (p3) supports the current official spelling as does the Parish Plan from 1859 (Figures 1 and 2).  It looks like the Bureau and the Victorian WMIS need to correct their records.

Figure 1: Map published in the Argus on 30 Oct 1888 (page 3)

Figure 2: Parish Plan (Victoria, Public Lands Office, 1859) (http://handle.slv.vic.gov.au/10381/150137)

There is more history of Eumemmerring provided by the City of Casey.

# Model performance based on coefficient of efficiency

The Nash-Sutcliffe coefficient of efficiency E, is commonly used to assess the performance of rainfall runoff models.

$E = \frac{\sum \left(O_i - \bar{O} \right)^2 - \sum \left( M_i - O_i \right)^2 }{\sum \left(O_i - \bar{O} \right)^2 }$

Where O are the observed values and M are the modelled values.

The maximum values of E is 1.  A value of zero indicates that the model is only as good as using the mean of the observations.  E can be less than zero.

So what value of E indicates a model is reasonable? Researchers (Chiew and McMahon, 1993) surveyed 93 hydrologists, (63 responded ) to find out what diagnostic plots and goodness of fit statistics they used, which were the most important, and how they were used to classify the quality of a model fit. The most important diagnostic graphs were timeseries plots and scatter plots of simulated and recorded flows from the data used for calibration. R-squared and Nash-Sutcliffe model efficiency coefficient (E) were the favoured goodness of fit statistics. Results were considered acceptable if E ≥ 0.8.

I adapted the values provided by Chiew and McMahon (1993) to create the following table (Ladson, 2008).

Table 1: Model performance based on coefficient of efficiency

Classification Coefficient of efficiency
(Calibration)
Coefficient of efficiency
(Validation)
Excellent E ≥ 0.93 E ≥ 0.93
Good 0.8 ≤ E < 0.93 0.8 ≤ E < 0.93
Satisfactory 0.7 ≤ E < 0.8 0.6 ≤ E < 0.8
Passable 0.6 ≤ E < 0.7 0.3 ≤ E < 0.6
Poor E < 0.6 E < 0.3

Others have suggested different ranges.  For example, when assessing an ecosystem model in the North Sea, the following categorisation was used E > 0.65 excellent, 0.5 to 0.65 very good, 0.2 to 0.5 as good, and <0.2 as poor (Allen et al., 2007).  Moriasi et al (2015) provide the performance evaluation criteria shown in Table 2.

Table 2: Criteria for Nash-Sutcliffe coefficient of efficiency (Moriasi et al., 2015)

Component Temporal scale Very good Good Satisfactory Not Satisfactory
Flow Daily Monthly Annual > 0.80 0.7 < E ≤ 0.8 0.50 < E ≤ 0.70 ≤ 0.50
Sediment Monthly > 0.80 0.7 < E ≤ 0.8 0.45 < E ≤ 0.7 ≤ 0.45
Nitrogen
Phosphorus
Monthly > 0.65 0.50 < E ≤ 0.65 0.35 < E ≤ 0.50 ≤ 0.35

In modelling of flows to the Great Barrier Reef (GBR) the following criteria were adopted (Waters, 2014):

• Daily Nash Sutcliffe Coefficient of Efficiency, E > 0.5
• Monthly E > 0.8

For GBR constituent modelling, the following ranges were used for the coefficient of efficiency: Very good E >0.75; Good 0.65 < E ≤ 0.75; Satisfactory 0.50 < E ≤ 0.65; unsatisfactory ≤ 0.50.

Also note that the Nash-Sutcliffe coefficient has been criticised and there are alternative proposals but it remains frequently used in hydrologic modelling.  See the references below for more information.

There is also an interesting discussion on stakoverflow https://stats.stackexchange.com/questions/414349/is-my-model-any-good-based-on-the-diagnostic-metric-r2-auc-accuracy-e/

References

Allen, J., P. Somerfield, and F. Gilbert (2007), Quantifying uncertainty in high‐resolution coupled hydrodynamic‐ecosystem models, J. Mar. Syst.,64(1–4), 3–14, doi:10.1016/j.jmarsys.2006.02.010. (link to research gate)

Bardsley, W.E. (2013) A goodness of fit measure related to r2 for model performance assessment.  Hydrological Processes. 27(19):2851-2856. DOI: 10.1002/hyp.9914

Criss RE, Winston WE. 2008. Do Nash values have value? Discussion and alternate proposals. Hydrological Process 22: 2723–2725.

Gupta HV, Kling H. 2011. On typical range, sensitivity, and normalization of mean squared error and Nash-Sutcliffe efficiency type metrics. Water Resources Research 47: W10601, doi:10.1029/2011WR010962.

Gupta HV, Kling H, Yilmaz KK, Martinez GF. 2009. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology 377:80–91.

Ladson, A. R. (2008) Hydrology: an Australian Introduction.  Oxford University Press. (link)

McCuen RH, Knight Z, Cutter, AG. 2006. Evaluation of the Nash–Sutcliffe efficiency index. Journal of Hydrologic Engineering 11:597–602.

Moriasi, D., Gitau, M. Pai, N. and Daggupati, P. (2015) Hydrologic and Water Quality Models: Performance Measures and Evaluation Criteria Transactions of the ASABE (American Society of Agricultural and Biological Engineers) 58(6):1763-1785 (Link to article at research gate)

Murphy, A. H. (1988) Skill scores based on the mean square error and their relationship to the correlation coefficient.  Monthly Weather Review 116: 2417-2424.

Waters, D. (2014) Modelling reductions of pollutant loads due to improved management practices in the Great Barrier Reef catchments. Whole of GBR, Technical Report. Volume 1.Department of Natural Resources and Minds, Queensland Government. (Link to article at research gate)

Willmott, C. J. (1981) On the validation of models.  Physical Geography 2: 184-194.