I am working with messy excel file with multiple date formats
2016-10-17T12:38:41Z
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC
I want to convert all of the above in yyyy-mm-dd
format. I am using following code for the conversion but lot of values are coming NA.
as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))
How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case. Please help
If I add 'dmy'
to the list then at least all of the cases in your example are succesfully parsed:
z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016",
"10-Sep-15", "13-Oct-09", "18-Oct-2016 05:42:26 UTC")
library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"
Your big problem will be the third and fourth elements: are these actually meant to be 'ymd'
and 'dmy'
respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...
For what it's worth I also tried the new anytime package - it only handled the first and last element.
Removing the times first makes it possible to specify only three alternatives in orders
to parse the sample data in the question. This interprets 10-Sep-15
and 13-Oct-09
as dmy but if you want them interpreted as ymd then uncomment the commented out line:
orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")
as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))
giving:
[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"
or if the commented out line is uncommented then:
[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"
Note: The input is:
x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15",
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")