# Log-normal flood frequency analysis

Although the log-normal distribution is not usually used for flood frequency estimation in Australia it can be useful to compare results with more common approaches.

The general equation for flood quantile estimation (from Chow, 1951) is:

$x_t = \mu+K_t\sigma$      (1)

Where $K_t$ is the frequency factor that depends on the distribution of the flood data and the probability of interest, $\mu$ and $\sigma$ are the mean and standard deviation.

## Frequency factors

The classical frequency factor for the log-normal distribution is:

$K_t = \left( \frac{1}{C_v} \right) \left[ \exp \left\{ Z\left( \log \left(1 + C_v^2 \right) \right)^\frac{1}{2} - 0.5\left( \log \left( 1 + C_v^2 \right) \right) \right\} -1\right]$      (2)

Where, $Z$ is a standard normal deviate and $C_v$ is the coefficient of variation $= \frac{\mu}{\sigma}$.  The standard normal deviate can be calculated, for example, using the qnorm function in R.  So for the 100-year flood, exceedance probability = 1/100, $z = 2.326$.

An R function to calculate this log-normal frequency factor is:

# Frequency factor for the log-normal distribution
# m - mean
# s - standard deviation
# p - exceedance probability

FF_LogNormal <- function(m, s, p) {
cv <- s/m
z <- qnorm(1 - p)
(1/cv) * (exp(sqrt(log(1 + cv^2)) * z - 0.5 * log(1 + cv^2)) - 1)
}



### Example 1

Kite (2004) provides an example we can use for testing.

Annual maximum discharges of the Saint John River at Fort Kent, New Brunswick for the 37 years from 1927 to 1963 have a mean of 81,000 cfs and a standard deviation of 22,800. Estimate the 100-year flood.

As, calculated below, Q100 = 148,000 cfs

m <- 81000
s <- 22800
p <- 0.01

Kt <- FF_LogNormal(81000, 22800, p) #2.943

(Q100 <- m + Kt * s)
#[1] 148221.3



If flood data are log-normally distributed, that means the logarithms of the flood data are normally distributed.  This suggests there are two ways of calculating the flood quantiles.  We can use the data as-is along with the log-normal frequency factor (as in the above example), or take the logs of the data and use the frequency factor from the normal distribution.

### Example 2

Continuing with the example from Kite.  The mean and standard deviation of the logarithms of the 37 events from the Saint John River are 11.263 and 0.284 (in cfs units).

The frequency factor for the 100-year flood, based on the normal distribution, will be 2.326 so the 100-year flood estimate is 150,800 cfs.  Similar to the previous example but not exactly the same.

m.log <- 11.263
s.log <- 0.284

Kt <- qnorm(1 - 0.01) #2.326348

(Q100 <- exp(m.log + Kt*s.log))
#[1] 150795.9


### Example 3

Lets repeat this analysis using some Australian data.  An annual flood series for the Hunter River at Singleton is available here.

The two estimates for the 100-year flood are 10,500 cumec using data as-is and 13,800 cumec when calculating in the log domain.

# Hunter River at Singleton
library(repmis)
my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv'

singleton <- source_data(my.url)
str(singleton)

m <- mean(singleton$Peak (cumec)) # 1401.7 s <- sd(singleton$Peak (cumec))   # 2312.9
Kt <- FF_LogNormal(m, s, 0.01) # 3.917
(Q100 <- m + Kt * s)
# 10460.5

m.log <- mean(log(singleton$Peak (cumec))) # 6.425 s.log <- sd(log(singleton$Peak (cumec)))   # 1.336
Kt <- qnorm(1 - 0.01) # 2.326
(Q100 <- exp(m.log + Kt*s.log))
# 13818.9



## Bayesian frequency factor

Kuczera (1999) derives a frequency factor for the log-normal distribution which takes account of the uncertainty in the parameters – the mean and standard deviation.  These parameters must be estimated from the flood data.  Assuming nothing is known about these parameters other that what we learn from the flood measurements (i.e. the Bayesian prior is noninformative), the frequency factor is:

$K_t = t_{n-1} \times \sqrt{1 + \frac{1}{n}}$

Using this frequency factor will result in a more conservative estimate of flood quantiles. As the sample size increases the the Bayesian frequency factor will approach the frequency factor based on the normal distribution.

An R function to calculate this Bayesian log-normal frequency factor is:

# Frequency factor for the log-normal distribution
# n - number of data points
# p - exceedance probability

FF_LogNormal_Bayes <- function(n, p) {

qt( 1-p, n-1) * sqrt(1 + 1/n)

}



### Example 4

Calculate the Bayesian estimate for the 100-year flood quantile using the data for the Hunter River at Singleton.

Calculations are below.  The estimate is 17,350 cumecs.

library(repmis)
my.url <- 'https://dl.dropboxusercontent.com/u/10963448/Singleton.csv'

singleton <- source_data(my.url)

m.log <- mean(log(singleton$Peak (cumec))) # 6.425 s.log <- sd(log(singleton$Peak (cumec))) # 1.336
Kt <- FF_LogNormal_Bayes(nrow(singleton), 0.01) # 2.4966

(Q100 <- exp(m.log + Kt*s.log))
# 17348


### Testing

So we have three flood quantile estimators for log-normally distributed data:

1. Take logs and use the frequency factor from the normal distribution
2. Use data as-as and calculate the frequency factor using equation 2.
3. Take logs and use the Bayesian frequency factor.

If the data set is large, all the results are the same.

set.seed(2016)
my.flood <- rlnorm(1e6, 6, 1)  # generate 1 flood values from a log-normal distribution.

# True value

exp(6 + 1*qnorm(1 - 0.01)) # 4131.302

# Direct calculation of quantile
quantile(my.flood, probs = 0.99, type = 8 ) # 4114.213

# Taking logs and using normal distribution frequency factor
Q100.log <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - 0.01)
exp(Q100.log) # 4130.316

# Use frequency factor from equation 2
Q100 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), 0.01)
Q100 # 4127.146

# Bayes frequency factor
Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), 0.01)
exp(Q100.Bayes) #4130.336



For small datasets, the estimates vary substantially.

First, let’s make a function that simulations 30 years of log-normal flood peaks and calculates the 100-year quantiles.

Test_f <- function(i, p) {

# i = dummy variable so the function can be included in a loop
# p = exceedance probability

my.flood <- rlnorm(30, 6, 1) # Generate 30 years of flood data
Q100.normal <- mean(log(my.flood)) + sd(log(my.flood))*qnorm(1 - p)
Q100.eq2 <- mean(my.flood) + sd(my.flood) * FF_LogNormal(mean(my.flood), sd(my.flood), p)
Q100.Bayes <- mean(log(my.flood)) + sd(log(my.flood))*FF_LogNormal_Bayes(length(my.flood), p)
data.frame(Q100.normal = exp(Q100.normal), Q100.eq2 = Q100.eq2, Q100.Bayes = exp(Q100.Bayes))

}



Now, we’ll call this multiple times and compare the average quantiles for the three methods with the true quantile.

set.seed(5)
out <- lapply(1:10000, Test_f, p = 0.01)
out.df <- do.call(rbind, out)
colMeans(out.df)

Q100.normal Q100.eq2 Q100.Bayes
4334.727 3678.353 5204.641

# True quantile 4131



So, for for this data set, on average, the quantile based on taking the logs and using the frequency factor from the normal distribution is about right.  The quantile based on equation 2 is too low, and the quantile using the Bayesian frequency factor is too high.

In another post, we’ll look at quantifying the uncertainty in quantile estimates.

## References

Chow, V. T. (1951) A general formula for hydrologic frequency analysis.  Transactions, American Geophysical Union 32:231-237.

Kite, G. W. (2004) Frequency and risk analysis in hydrology.  Water Resources Publications, LLC.

Kuczera (1999) Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference.  Water Resources Research 35(5): 1551-1557.