Frequency factors for the Pearson III distribution

Flood frequency analysis is used to relate the magnitude of a flood to its probability of exceedance. The typical situation is that we have some data on past floods, perhaps the largest flood in each of the last 50 years, and would like to calculate the value for a design flood event, for example, the 1% flood.

There are standard approaches to this analysis as specified in engineering manuals such as Australian Rainfall and Runoff and Bulletin 17B.

Here I am writing about one small aspect of this process, the calculation of frequency factors for the Pearson III distribution.

Chow (1951) proposed a general equation to relate flood frequency and magnitude:

$\log \! 10 \! \left( Q_y \right ) = m + s \times K_y$

For annual flood frequency analysis, for a particular site where historical flood flows have been measured, the terms are defined as follows.

$Q_y$ = the flood discharge having an annual exceedance probability (AEP) of 1 in Y years
$m$ = mean of the logarithms of the peak annual discharges
$s$ = standard deviation of the logarithms of the peak annual discharges
$K_y$ = frequency factor for the required AEP.

$K_y$ depends on the chosen statistical distribution. In practical hydrology in Australia and the US, it has been common to assume that the logarithms of flood discharges are distributed according to the Pearson III distribution. This means that $K_y$ is a function of 1) AEP and 2) the skewness of the logarithms of the annual discharges i.e. we define:

$g$ = the skewness of the logarithms of the peak annual discharges; possibly weighted by information on skewness at neighbouring sites.

There are a few different formulas to calculate sample skewness Australian Rainfall and Runoff uses:

$g = \frac{ n^2 \sum \left(x_i^3 \right) - 3 n \sum x_i \sum x_i^2 + 2 \left( \sum x_i \right)^3} {n \left(n-1 \right) \left(n-2 \right) s^3}$

Where $n$ is the number of samples (floods) and $x$ is the log10 of the flood magnitudes.

In R this can be calculated using the skewness  function in the e1071 package with the type argument set to 2.

library(e1071)
x <- c(40, 49, 50, 51)
skewness(x, type = 2)
#[1] -1.845683



The parameters m, s and g are calculated from historical flood data, while traditionally the frequency factor has been looked up in a table (see an example below).  The source of these tabulated values are two papers by Harter (1969; 1971).

Tabulated frequency factors from Australian Rainfall and Runoff

It would certainly be more convenient to calculate the frequency factors, rather than look them up, and there are a few ways this can be done.

1. Wilson-Hilferty approximation

The frequency factor for the Pearson III distribution can be approximately related to standard normal deviate $z$ using the Wilson-Hilferty approximation.

$K_y \approx z + \left( z^2 - 1 \right) \frac{g}{6} + \frac{1}{3} \left( z^3 - 6z \right) \left( \frac{g}{6}\right) ^2 - \left( z^2 - 1 \right) \left( \frac{g}{6} \right) ^3 + z \left( \frac{g}{6} \right) ^4 - \frac{1}{3} \left( \frac{g}{6} \right)^5$

Example:

What is the frequency factor for AEP = 1% and g = 1?
$K_y \approx 3.030322$ as calculated below.

The Wilson-Hilferty approximation is reasonably accurate for $\left| g \right| \le 2$ and aep between 0.01 and 0.99 (Chowdhury and Stedinger, 1991)

Ky_WH <- function(g, aep){ # Wilson Hilferty approximation

z <- qnorm(1-aep) # exceedance probability
b <- g/6
z + (z^2 - 1)*b + (1/3)*(z^3 - 6*z)*b^2 - (z^2 - 1)*b^3 + z*b^4 - (1/3)*(b)^5
}

Ky_WH(1, 0.01)
# [1] 3.030322



2. Pearson III quantile functions in R

The frequency factor is equivalent to the quantile function of a distribution.  Several packages provide the functionality to calculate quantiles of the Pearson III distribution but the lmomco package uses a parameterisation that is most familiar to hydrologists.  The function we need is  quape3.  We can provide a convenient wrapper in R as follows.

library(lmomco)
Ky_Qp3 <- function(g, aep){
# Return the quantile of the Pearson III distribution given
# skewness of g, and annual exceedance probability, aep

param <- vec2par(c(0,1,g), type='pe3')
quape3(1 - aep, param)
}

Ky_Qp3(1, 0.01)
#[1] 3.022559


Notice that the frequency factor calculated here for g = 1, and aep = 1% is a bit different to that calculated using the Wilson-Hilferty approximation above i.e. 3.022559 compared to 3.030322.

3. Gamma formula

The Pearson III frequency factor can also be also be calculated from the quantile function of the gamma distribution. This can be useful in Excel which provides the GAMMA.INV function.

$\mathrm{shape} = a = 4/g^2$
$\mathrm{scale} = b = 1/ \sqrt{a}$
$K_y = \mathrm{sign \left( g \right) \left( \mathrm{qgamma} \left(1 - aep, \, shape = a, \, scale = b \right) - a \times b \right) }$

If g = 0, $K_y$ is set to the quantile of the normal distribution. If g < 0, then use aep, rather than 1 – aep in the qgamma function.

We need to be a bit careful, as there are numerical issues as g approaches zero.  It is always dangerous using exact comparisons for floating point numbers, a lesson I’ve needed to learn more than once (see Circle 1 in the R Inferno)

To get around this problem, use the Wilson-Hilferty for small values of g.  Some playing around suggests that qgamma does not produce the results we expect if g is less than 10-8,.  For these very small values, which will probably only arise through numerical approximations, use the Wilson-Hilferty approximation.

Ky_gamma <- function(g, aep){

if(abs(g) < 1e-8) { # Use the Wilson-Hilferty approximation to avoid numerical
# issues near zero
z <- qnorm(1-aep)
b <- g/6
return(z + (z^2 - 1)*b + (1/3)*(z^3 - 6*z)*b^2 -
(z^2 - 1)*b^3 + z*b^4 - (1/3)*(b)^5 )
}

if(g < 0) aep  <- 1 - aep
a  <- 4/(g^2)
b  <- 1/sqrt(a)
sign(g)*(qgamma(1 - aep, shape = a, scale = b) - a*b)
}

Ky_gamma(1, 0.01)
# [1] 3.022559


In the next blog I’ll do some comparisons between algorithms and with Harter’s tables.