Find the dates of the first and last non-missing data values

Often flow and water quality data will supplied for a block of dates so there may be a large number of leading and trailing missing values. It is useful to be able to find where non-missing data starts and ends.

# Generate some test data
my.dates <- seq(from = as.Date('2000-01-01'), to = as.Date('2000-12-31'), by = '1 day')
flow = c(rep(NA, times = 100), rlnorm(116), rep(NA, times = 150))

We can find the element of the flow vector that corresponds to the first and list non-missing value as follows.

min(which(!is.na(flow))) # first non-missing element
max(which(!is.na(flow))) # last non-missing element

Using this as an index to my.dates will return the date of the first and last non-missing values.

my.dates[min(which(!is.na(flow)))] # date of first non-missing value
## [1] "2000-04-10"
my.dates[max(which(!is.na(flow)))] # date of last non-missing value
## [1] "2000-08-03"

The usual case is that data would be in a dataframe with multiple columns. Using the ideas above, we can write a function to return the start and end dates of non-missing data in each column. Use lapply to apply the function to each column and then wrap up results into another dataframe with do.call.

# Generate an additional column of data with leading and trailing missing values.
turbidity = c(rep(NA, times = 66), rlnorm(200), rep(NA, times = 100))

# Store all the test data in a data frame.
df.test = data.frame(dates = my.dates, flow = flow, turbidity = turbidity)

# In the following function
# df = data frame 
# date.col = name of the column that contains dates

NonMissingStartEnd <- function(my.col, date.col, df){
  start <- df[[date.col]][min(which(!is.na(df[[my.col]])))]
  end <- df[[date.col]][max(which(!is.na(df[[my.col]])))]
  data.frame(my.col = my.col, start=start, end=end)
  
}

# apply function to all columns
x <-  lapply(names(df.test), NonMissingStartEnd, date.col = 'dates', df=df.test)
do.call(rbind,x)
my.col start end
dates 2000-01-01 2000-12-31
flow 2000-04-10 2000-08-03
turbidity 2000-03-07 2000-09-22

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s