parse multiple date format pandas

Question

I 've got stuck with the following format:

0   2001-12-25  
1   2002-9-27   
2   2001-2-24   
3   2001-5-3    
4   200510
5   20078

What I need is the date in a format %Y-%m

What I tried was

 def parse(date):
     if len(date)<=5:
         return "{}-{}".format(date[:4], date[4:5], date[5:])
     else:
         pass

  df['Date']= parse(df['Date'])

However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None. So, how can I do it? Thank you!

Umar.H · Accepted Answer · 2020-06-05 09:10:57Z

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

         date date_fixed
0  2001-12-25 2001-12-25
1   2002-9-27 2002-09-27
2   2001-2-24 2001-02-24
3    2001-5-3 2001-05-03
4      200510 2005-10-01
5       20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4          NaT
5          NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4   2005-10-01
5   2007-08-01

then we re-assign to your dataframe.

halfer · Accepted Answer · 2020-07-13 10:09:13Z

1

You could use a regex to pull out the year and month, and convert to datetime :

df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])

pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"

df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])

print(df)

        Dates
0   2001-12-01
1   2002-09-01
2   2001-02-01
3   2001-05-01
4   2005-10-01
5   2007-08-01

Note that pandas automatically converts the day to 1, since only year and month was supplied.

edited Jul 13, 2020 at 10:09

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Jun 5, 2020 at 9:11

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

2 Comments

almo Over a year ago

Hi@sammywemmy, I accepted Datavoice's answer as his answer came first. Because your solution is also excellent. Thank you:)

sammywemmy Over a year ago

not a problem ... as long as ur challenge is solved, all is fine

Collectives™ on Stack Overflow

parse multiple date format pandas

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related