converting multiple date formats into one in r

I am working with messy excel file with multiple date formats

2016-10-17T12:38:41Z 
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC

I want to convert all of the above in yyyy-mm-dd format. I am using following code for the conversion but lot of values are coming NA.

as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))

How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case. Please help

Asked By: Neil
||

Answer #1:

If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:

 z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016", 
 "10-Sep-15",  "13-Oct-09", "18-Oct-2016 05:42:26 UTC")

library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"

Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...

For what it's worth I also tried the new anytime package - it only handled the first and last element.

Answered By: Ben Bolker

Answer #2:

Removing the times first makes it possible to specify only three alternatives in orders to parse the sample data in the question. This interprets 10-Sep-15 and 13-Oct-09 as dmy but if you want them interpreted as ymd then uncomment the commented out line:

orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")

as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))

giving:

[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"

or if the commented out line is uncommented then:

[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"

Note: The input is:

x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15", 
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")
Answered By: G. Grothendieck
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .



# More Articles