155

I have a Pandas data frame and one of the columns, have dates in string format of YYYY-MM-DD.

For e.g. : '2013-10-28'

At the moment the dtype of the column is object.

How do I convert the column values to Pandas date format?

1
  • Pandas has a native DATETIME type (datetime64); it doesn't have a native DATE dtype (any column containing DATE objects will be object dtype). It's much faster to work with datetime64 instead of dtype=object column of date objects. You can see here that something as simple as adding timedelta to timestamps is 100x faster on datetime64. If you really need DATE objects (e.g. need to dump the data into a database), then it's probably better to work on datetime64 while in pandas and convert to dates before storing the data elsewhere. Commented Apr 15, 2024 at 20:32

11 Answers 11

160

Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):

In [11]: df
Out[11]:
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [12]: pd.to_datetime(df['time'])
Out[12]:
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]

In [13]: df['time'] = pd.to_datetime(df['time'])

In [14]: df
Out[14]:
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Handling ValueErrors
If you run into a situation where doing

df['time'] = pd.to_datetime(df['time'])

Throws a

ValueError: Unknown string format

That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:

df['time'] = pd.to_datetime(df['time'], errors='coerce')
Sign up to request clarification or add additional context in comments.

10 Comments

Hi Guys, @AndyHayden can you remove the time part from the date? I don't need that part?
In pandas' 0.13.1 the trailing 00:00:00s aren't displayed.
and what about in other versions, how do we remove / and or not display them?
I don't think this can be done in a nice way, there is discussion to add date_format like float_format (which you've seen). I recommend upgrading anyway.
my problem is my date is in this format... 41516.43, and I get this error. I would expect it to return something like 2014-02-03 in the new column?! THE ERROR: #convert date values in the "load_date" column to dates budget_dataset['date_last_load'] = pd.to_datetime(budget_dataset['load_date']) budget_dataset -c:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
|
138

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

7 Comments

Nice - thank you - how do I get rid of the 00:00:00 at the end of each date?
The pandas timestamp have both date and time. Do you mean convert it into python date object?
You can convert it by df['time'] = [time.date() for time in df['time']]
what does the [ns] mean, can you make the text string a date and remove the time part of that date?
@yoshiserry it's nanoseconds, and is the way the dates are stored under the hood once converted properly (epoch-time in nanoseconds).
|
47

I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:

dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

You can also select column(s) to parse by name rather than position, e.g. parse_dates=['thedate']

1 Comment

Thanks, that was exactly what I needed. The documentation has moved, though, you can find it here: pandas.pydata.org/pandas-docs/stable/reference/api/…
30

Now you can do df['column'].dt.date

Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.

4 Comments

This one does not work for me, it complains: Can only use .dt accessor with datetimelike values
you may have to do df[col] = pd.to_datetime(df[col]) first to convert your column to date time objects.
The issue with this answer is that it converts the column to dtype = object which takes up considerably more memory than a true datetime dtype in pandas.
This person didn't say anything about the size of the dataset, or about parquet.
13

If you want to get the DATE and not DATETIME format:

df["id_date"] = pd.to_datetime(df["id_date"]).dt.date

2 Comments

I agree with @Asclepius. I've had a lot more issues with using .dt.date than just converting to datetime first.
we are getting following error: could not convert string to float: '12/12/2010'. How to solve it ?
7

Another way to do this and this works well if you have multiple columns to convert to datetime.

cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)

3 Comments

Question ask for date not datetime.
@MarkAndersen aslong as you have date only values in your columns, convertion to datetime will retain pertaining information only. If you explicity convert using df['datetime_col'].dt.date that will result to an object dtype; loss in memory management.
There is no reason to use .apply here, considering a direct use of pd.to_datetime works.
2

It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.

#set an index by dates
df.set_index(['time'], drop=True, inplace=True)

After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

#Convert to monthly dates
df.index = df.index.to_period(freq='M')

#Convert to strings
df.index = df.index.strftime('%Y-%m')

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

For brevity, I don't show that I run the following code after each line above:

print(df.index)
print(df.index.dtype)
print(type(df.index))

This gives me the following output:

Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>

Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Comments

1

For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by @SSS, but using rather the datetime library is:

import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())

Comments

0
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  object
 1   endDay          110526 non-null  object

import pandas as pd

df['startDay'] = pd.to_datetime(df.startDay)

df['endDay'] = pd.to_datetime(df.endDay)

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  datetime64[ns]
 1   endDay          110526 non-null  datetime64[ns]

1 Comment

No, this converts it to a 'datetime64[ns]' type not a 'date' type. Those are different things.
-1

Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column

Comments

-1

You can use the pandas.to_datetime()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.