When cleaning a dataset you need to check everything. I’m trying to make sense of the extreme storms archive provided by the Bureau of Meteorology. Most of the records are fine but there are some errors in latitude and longitude (Figure 1).
Australian towns like Cobram, Rutherglen and Wodonga (the bottom rows in Figure 1) are all in the southern hemisphere so, by convention the latitudes are negative. Similarly, the whole of Australia is well south of the equator so latitudes of zero of -1 are not correct. The numbers, -1, 0 and 1 seem to be sentinel values that indicate missing so need to be recoded.
Wikipedia lists the extreme points of Australia. Those for the mainland plus Tasmania are:
- Northernmost point – Cape York Peninsula, Qld -10o41′
- Southernmost point – South East Cape, Tas -43o39′
- Westernmost point – Steep Point, WA 113o09′ E
- Easternmost point – Cape Byron, NSW 153o38’E
Points outside this range may refer to islands, or be in error. Best to check.
Latitudes and longitudes can be looked up for a locality using the
geocode function in the ggmap package in R. We can also go from coordinates to locality using
geocode('Rutherglen, Australia') Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Rutherglen,%20Australia&sensor=false lon lat 1 146.4625 -36.05556 # Which state is Rutherglen in? x <- geocode('Rutherglen, Australia', output = 'all') x$results[]$address_components[]$long_name  "Victoria" x$results[]$address_components[]$short_name  "VIC" # From coordinates to locality revgeocode(c(146.4625, -36.05556)) #  "62 Main St, Rutherglen VIC 3685, Australia" # Which state are these coordinates in? x <- revgeocode(c(146.4625, -36.05556), output = 'all') x$results[]$address_components[]$long_name # "Victoria" x$results[]$address_components[]$short_name # "VIC" # Alternatively x <- geocode('Rutherglen, Australia', output = 'more') as.character(x$administrative_area_level_1) # Victoria
Some manual checking will be required but a lot can be automated which will avoid many gross errors.