1

I have a data frame df with shape (500000,70) and several columns including invalid dates like 4000-01-01 00:00:00. In a smaller version of this data frame I tried

df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)

which worked fine. Also the version

df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))

worked. For the long data frame version I receive the following error

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00

Any suggestions how to solve this problem in an elegant way or what the problem might be?

Thank you.

0

2 Answers 2

2

If add parameter errors='coerce' to to_datetime function it return NaT for all not parseable datetimes:

df["date"] = pd.to_datetime(df["date"], errors='coerce')
Sign up to request clarification or add additional context in comments.

Comments

1

The error is because:

In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')

The upper limit of the date is this. And your value is out of the range, hence OutOfBoundsError.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.