2

I have a set of data and wish to do the analysis using Pandas, but the problem is the date formats in the dataset are inconsistent. Even I had changed the date by format cells but still have some date stored as text.

data_set

The thing I get in Python: enter image description here

1
  • You need to correct your data import routine so that all the dates are "real" dates. Then the numberformat will be irrelevant. Commented Aug 14, 2021 at 11:18

1 Answer 1

1

You can use pd.to_datetime() with errors='coerce' parameter, as follows:

# convert Date with different format strings
df['Date1'] = pd.to_datetime(df['Date'], format='%m/%d/%Y', errors='coerce')
df['Date2'] = pd.to_datetime(df['Date'], format='%m-%d-%y', errors='coerce')

Combine the results with .combine_first():

df['Date_combined'] = df['Date1'].combine_first(df['Date2'])

Then, you can sort the dates by:

df.sort_values(by='Date_combined')

Demo

Input:

         Date
0  11/26/2013
1  11/26/2015
2   3/23/2014
3    08-02-13
4    08-02-15
5    09-02-13
6   1/22/2014

Output:

         Date      Date1      Date2 Date_combined
0  11/26/2013 2013-11-26        NaT    2013-11-26
1  11/26/2015 2015-11-26        NaT    2015-11-26
2   3/23/2014 2014-03-23        NaT    2014-03-23
3    08-02-13        NaT 2013-08-02    2013-08-02
4    08-02-15        NaT 2015-08-02    2015-08-02
5    09-02-13        NaT 2013-09-02    2013-09-02
6   1/22/2014 2014-01-22        NaT    2014-01-22
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.