100-year flood: Geometric distribution – Number of years to the next flood

Here we are interested in the probability distribution of the number of years, Y,  that will elapse before the next 100-year flood occurs. The number of independent trials up, but not including, the first success has a Geometric distribution.  Applying this to flooding means that the number of years to wait before a 100-year flood, has a geometric distribution with the probability parameter equal to 0.01:

Y \sim \mathrm{geom}(0.01)

Example

You have a 5-year assignment to a flood-prone mining camp.  What is the probability that you will experience a 100-year flood while you are living in the camp?

We need the probability that we will wait 4 or fewer years before a 100-year flood.

The probability that you will need to wait n years or fewer is:

\displaystyle \sum _{y=0}^{n} p(1-p)^y

In this case, we interested in:

\displaystyle \sum _{y=0}^{4} 0.01(1-0.01)^y = 0.049

That is, there is a 4.9% chance that the camp will experience a 100-year flood while we are living there.

In R

pgeom(4, 0.01) = 0.049

We can also calculate this using the Binomial distribution as 1 minus the probability of zero floods in 5 years.

1 - pbinom(0, 5, 0.01) = 0.049

The probability that you will be ok for 4 years but be flooded in the 5th year.

P(Y = y) = p(1-p)^n

In this case:

0.01(1-0.01)^4 = 0.0096

Which is 0.96%

In R
dgeom(4, 0.01) = 0.0096

The probability of being flooded increases as the length of our assignment increases as shown in the graph below.

Probability of being flooded as a function of the number of years living in the camp

Probability of being flooded as a function of the number of years living in the camp

We can also plot the probability of being safe for a certain number of years and then being flooded in the next year.

Probability of waiting for a certain number of years and then being flooded in the next year

Probability of being safe for a certain number of years and then being flooded in the next year

Notice that the probability of being flooded in the first year (being safe for zero years) is actually the highest.

Steven Pinker has a nice discussion of this effect in his book ‘The better angels of our nature‘ about 17 pages into Chapter 5.  His discussion uses lightning strikes which I’ve rephrased to floods here.

Suppose that floods are random: every year, the chance of a flood is the same, 1%.  Your house was flooded this year.  When is it most likely you will be flooded again? The answer is next year.  That probability, to be sure, is not very high, 1%.  Now think about the chance you will be safe for a year and then experience a flood the year after next.  For that to happen, two things have to take place.  First, flooding will have to occur the year after next, with a probability of 1%.  Second, flooding will not occur next year.  To calculate that probability you have to multiply the chance there will be no flood next year (0.99 or 1 minus 0.01) by the chance there will be a flood the year after (1% or 0.01).   The probability of these two events occurring is 0.0099, a bit lower than the chance of a flood next year.   What is the chance of being safe for 99 years and then getting a flood? The probability is (1 – 0.99)99 x 0.01 = 10-200, a very small number.

So, if you’ve just experienced a 100-year flood, when are you most likely to experience the next one?  After 100-years? Nope.  The probability is highest that it will happen again next year.  The idea that we’ve had a big flood and will now be safe for a while, is a fallacy.

We can use a random variable drawn from a geometric distribution to simulate the number of safe years before the next 100-year flood occurs.

The mean of a Geometric distribution is

\frac{(1-p)}{p}

In this case, p = 0.01 so the mean number of safe years before a 100-year flood is

\frac{(1-0.01)}{0.01} = 99

On average, we will be safe for 99 years. This is consistent with the idea that the 100-year flood will occur on average once every 100-years.  This may be comforting, but the distribution of the number of safe years isn’t nicely bunched around 99.  The median number of safe years can be calculated using the formula:

\left( \frac{-1}{log_2(1-p)} \right) - 1

In this case, with p = 0.01, the median number of years before the next 1% flood is 68 and the mode (most common value) is zero (i.e. the flood will occur next year). The probability mass values for Y \sim \mathrm{geom}(0.01) are shown below.

Geometric probability mass function (p = 0.01).  The probability that a flood will occur in the year after the indicated number of years without a flood

Geometric probability mass function (p = 0.01). The probability that a flood will occur in the year after the indicated number of years without a flood

Simulating the number of years between floods produces the results shown below (100 simulations in a 10 x 10 grid).  The mean and median are near their theoretical values (i.e. the mean is about 99) but the results show that for many simulations there are a small number of years between floods, while in a few cases there are a very large number of years.

Simulation of the number of years untilla 1% flood occurs. One hundred different random values drawn from a Geometric distribution with p = 0.01

Simulation of the number of years until a 1% flood occurs. One hundred different random values drawn from a Geometric distribution with p = 0.01

We see the same pattern if we simulate a large number of years and randomly determine if a 100-year flood will occur in each year.  A 10,000 year simulation is shown below (100 x 100 year grid). Blue squares show the occurrence of 100-year floods.

A 10,000 year simulation.  Blue blue square show the occurrence of 100-year floods

A 10,000 year simulation (100 x 100 year grid). Blue squares show the occurrence of 100-year floods

The mean and median time between floods are close to their theoretical values, but as we’ve seen before, there are many small gaps between floods and a few very long flood-free periods.  Where dots are close together show periods where floods are clustered.

This type of counter intuitive clustering in random data has been noted in many other situations including the bombing of London during WWII  and the patterns produced by glowworms.  See Pinker (2011) and Gould (1991) for further discussion.

Pinker, S. (2012) The better angels of our nature.  London: Penguin Books. pp 245-247

Gould, S. J. (1991) Glow, big glowworm.  bully for brontosaurus.  New York: Norton (as cited by Pinker)

For R code see this gist.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s