Dates of missing data

Continuing from the last post where missing data were mapped, here we identify the start and end dates of sequences of missing values.  This analysis uses the rle (run length encoding) function in R and is based on the discussion in Section 2.1 (page 31) of Data manipulation in R.

The code from the previous post generated the following map of missing values.

Map of missing values

Map of missing values

The code below identifies the start and end dates of the missing data and the length of sequences of missing values in days and years.

 

starts ends length length.years
9/04/2000 29/04/2000 20 0.055
22/06/2005 4/11/2006 500 1.370
4/10/2008 8/11/2009 400 1.096

 


###################################################################
#
# Start and end dates of missing flow sequences
#
# 
# tony.ladson@moroka.com.au
# 10 Jan 2015
#
####################################################################


# Generate some test data

xdate <- seq(from = as.Date('1/1/2000', format = '%d/%m/%Y'),
             to = as.Date('31/12/2010', format = '%d/%m/%Y'),
             by = '1 day')

# generate some flow data, usually this would come from a stream gauging record
flow <- rlnorm(n= length(xdate), meanlog = 1, sdlog=0.5)

# insert some missing data into the flow record
is.na(flow[100:120]) <- TRUE
is.na(flow[2000:2500]) <- TRUE
is.na(flow[3200:3600]) <- TRUE

# make a data frame - put you own data in place of my.data
my.data <- data.frame(xdate = xdate, flow = flow)


# use the rle function to determine start and end dates of missing data
# For details, see Section 2.1 of Spector, P. (2008) Data manipulation with R. Springer

rle.seq <- with(my.data, rle(is.na(flow)))
index <- which(rle.seq$values == TRUE)

# need to deal with the situation where missing values occur at the start
newindex <- ifelse(index > 1, index - 1, 0)
starts <- cumsum(rle.seq$lengths)[newindex] + 1
if(0 %in% newindex) starts <- c(1, starts)

with(my.data, xdate[starts])
# [1] "2000-04-09" "2005-06-22" "2008-10-04"

ends <- cumsum(rle.seq$lengths)[index]

with(my.data, xdate[ends])
# [1] "2000-04-29" "2006-11-04" "2009-11-08"

# summarise data in a data frame
x <- data.frame(starts=with(my.data, xdate[starts]), ends=with(my.data, xdate[ends]))
x <- within(x, length <- x$ends-x$start)
x <- within(x, length.years <- length/365)
x <- within(x, attr(length.years,'units') <- 'years') # correct the units

# Write to clipboard in a format that can be pasted into excel or word
write.table(x, 'clipboard', row.names = FALSE, sep = '\t')

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s