I have a set of data and wish to do the analysis using Pandas, but the problem is the date formats in the dataset are inconsistent. Even I had changed the date by format cells but still have some date stored as text.
1 Answer
You can use pd.to_datetime() with errors='coerce' parameter, as follows:
# convert Date with different format strings
df['Date1'] = pd.to_datetime(df['Date'], format='%m/%d/%Y', errors='coerce')
df['Date2'] = pd.to_datetime(df['Date'], format='%m-%d-%y', errors='coerce')
Combine the results with .combine_first():
df['Date_combined'] = df['Date1'].combine_first(df['Date2'])
Then, you can sort the dates by:
df.sort_values(by='Date_combined')
Demo
Input:
Date
0 11/26/2013
1 11/26/2015
2 3/23/2014
3 08-02-13
4 08-02-15
5 09-02-13
6 1/22/2014
Output:
Date Date1 Date2 Date_combined
0 11/26/2013 2013-11-26 NaT 2013-11-26
1 11/26/2015 2015-11-26 NaT 2015-11-26
2 3/23/2014 2014-03-23 NaT 2014-03-23
3 08-02-13 NaT 2013-08-02 2013-08-02
4 08-02-15 NaT 2015-08-02 2015-08-02
5 09-02-13 NaT 2013-09-02 2013-09-02
6 1/22/2014 2014-01-22 NaT 2014-01-22

