1

I have a similar question to this: Convert date column (string) to datetime and match the format and I want to convert a string like '12/7/21' to '2021-07-12' as a date object. I believe the answer given in the link above is wrong and here is why:

# The suggested solution on Stackoverflow 
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21']})
>>> df['Date']
0    15/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0   2021-07-15
Name: Date, dtype: datetime64[ns]

Because Python doesn't care about the specified format in the above code! If you simply change 15 to 12 and input '12/7/21' then it treats 12 as month instead of day:

>>> df = pd.DataFrame({'Date':['12/7/21']})
>>> df['Date']
0    12/7/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0   2021-12-07
Name: Date, dtype: datetime64[ns]

Does anyone know what's the best solution to this problem? (In R you simply use lubridate::dmy(df$Date) and it works perfectly)

1 Answer 1

3

.astype('datetime64') attempts to parse the string MM/DD/YY however if it can't (in the case that MM > 12) it will fall back to parsing as DD/MM/YY this is why you see seemingly inconsistent behaviour:

>>> import pandas as pd
>>> pd.Series('15/7/21').astype('datetime64')
0   2021-07-15
dtype: datetime64[ns]
>>> pd.Series('14/7/21').astype('datetime64')
0   2021-07-14
dtype: datetime64[ns]
>>> pd.Series('13/7/21').astype('datetime64')
0   2021-07-13
dtype: datetime64[ns]
>>> pd.Series('12/7/21').astype('datetime64')
0   2021-12-07
dtype: datetime64[ns]

The way to solve this is just to pass a Series of strings to pd.to_datetime instead of intermediately converting to datetime64s. So you can simply do

pd.to_datetime(df['Date'], format='%d/%m/%y')

without the .astype cast

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @tiger. Is it possible to change multiple columns using pipe instead of doing it one by one by assignment? Like if I want to change multiple columns in different ways it would be a lot of assignments and repetitions of df.
Yep pd.to_datetime also supports DataFrames so you can do pd.to_datetime(df[['col1', 'col2', 'col3']], format='%d/%m/%y') However if you have different formats you will need to use different assignments for each format.
Yeah I meant for different formats. In R you can do "df |> mutate(col1 = f_1(col1), col2 = f_2(col2), col3 = f_3(col3))", f_i being some arbitrary transformation, so there is no need for multiple assignments, "mutate" can take as many transformation as you want. I wish Python had a similar feature too, because numerous assignments is horrible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.