Checking and cleaning data is time consuming and tedious, but necessary. Fortunately, some smart people are starting to think about how it should be done and there are a few packages available in R. Here are some examples; so far I have mainly used Assertr which works well with the dplyr package which is great for manipulating data frames.
- Assertr – easy to add to a workflow based on data frames
- Checkmate – checking arguments sent to functions
- valiData – not much documentation yet
- visdata + conference presentation – visualising likely issues in data frames
- validate – focusses on data checking against domain knowledge