2

I 've got stuck with the following format:

0   2001-12-25  
1   2002-9-27   
2   2001-2-24   
3   2001-5-3    
4   200510
5   20078

What I need is the date in a format %Y-%m

What I tried was

 def parse(date):
     if len(date)<=5:
         return "{}-{}".format(date[:4], date[4:5], date[5:])
     else:
         pass

  df['Date']= parse(df['Date'])

However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None. So, how can I do it? Thank you!

2 Answers 2

7

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

         date date_fixed
0  2001-12-25 2001-12-25
1   2002-9-27 2002-09-27
2   2001-2-24 2001-02-24
3    2001-5-3 2001-05-03
4      200510 2005-10-01
5       20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4          NaT
5          NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4   2005-10-01
5   2007-08-01

then we re-assign to your dataframe.

Sign up to request clarification or add additional context in comments.

Comments

1

You could use a regex to pull out the year and month, and convert to datetime :

df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])

pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"

df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])

print(df)

        Dates
0   2001-12-01
1   2002-09-01
2   2001-02-01
3   2001-05-01
4   2005-10-01
5   2007-08-01

Note that pandas automatically converts the day to 1, since only year and month was supplied.

2 Comments

Hi@sammywemmy, I accepted Datavoice's answer as his answer came first. Because your solution is also excellent. Thank you:)
not a problem ... as long as ur challenge is solved, all is fine

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.