2

I have this data frame where the date columns are 'datetime' type

datetime64[ns]
2014-04-30T00:00:00 000000000

Now I want the date in this format - 2014-04-30 . So I used below code

df['StartingDate2'] = XY['StartingDate'].dt.strftime('%m/%d/%Y')

Now, this works accurately, but converts my date to object type. I read here that In Python, dates are objects. But I want to write my final data frame as a table in redshift and so I need my date columns defined as date for my python data frame. Any suggestion about how to do that , will be greatly appreciated.


Update:

sample dataframe

p1 = {'name': ['johnny', 'tommy', 'bobby', 'rocky', 'jimmy'], 'StartingDate': ['2015-07-14T00:00:00.000000000', '2013-10-30T00:00:00.000000000', '2014-04-30T00:00:00.000000000', '2014-01-27T00:00:00.000000000', '2016-01-15T00:00:00.000000000'], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA'], 'comment1': ['Very good performance', 'N/A', 'Need to work hard', 'No Comment', 'Not satisfactory'], 'comment2': ['good', 'Meets Expectation', 'N', 'N/A', 'Incompetence']}
XY = pd.DataFrame(data = p1)
XY['today'] = datetime.datetime.now()

When I am using to_datetime() solution - it doesn't work

XY['today2'] = pd.to_datetime(XY['today'],  format = '%m/%d/%Y')
XY['StartingDate2'] = pd.to_datetime(XY['today'],  format = '%m/%d/%Y')

Alternatively - this works, when strftime() and to_datetime() are used in combination.

XY['StartingDate2'] = XY['StartingDate2'].dt.strftime('%m/%d/%Y')
XY['StartingDate2'] = pd.to_datetime(XY['StartingDate2'])

But although this solution works for the sample data, is not working for me. The data I have looks like this -

array(['2015-09-29T14:34:39.000000000', '2015-10-07T14:13:03.000000000',
   '2015-10-07T19:17:50.000000000', ...,
   '2017-12-05T14:06:42.000000000', '2017-12-06T16:36:44.000000000',
   '2017-12-06T18:26:49.000000000'], dtype='datetime64[ns]'
1
  • You can floor the datetime to day frequency. I will update my post later Commented May 17, 2018 at 16:08

2 Answers 2

3

The solution is to use to_datetime

s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])
s
0    3/11/2000
1    3/12/2000
2    3/13/2000
dtype: object

pd.to_datetime(s)
0   2000-03-11
1   2000-03-12
2   2000-03-13
dtype: datetime64[ns]

So in your situation, you could write

df['StartingDate2'] = pd.to_datetime(XY['StartingDate'], format='%m/%d/%Y')

and in fact you can neglect the format keyword here. But if you provide it, you will get a huge speed increase.

Benchmarks

s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000']*1000)

Without format

%%timeit
pd.to_datetime(s)
453 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With format

%%timeit
pd.to_datetime(s, format='%m/%d/%Y')
9.68 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note for me: Always provide the format string if possible

Answer for updated question

You can use date times floor function

df = pd.DataFrame(pd.to_datetime(['2015-09-29T14:34:39.000000000', '2015-10-07T14:13:03.000000000',
   '2015-10-07T19:17:50.000000000',
   '2017-12-05T14:06:42.000000000', '2017-12-06T16:36:44.000000000',
   '2017-12-06T18:26:49.000000000']), columns=['A'])

df['B'] = df['A'].dt.floor('d')
df.dtypes

A    datetime64[ns]
B    datetime64[ns]
dtype: object

df

                      A          B
0   2015-09-29 14:34:39 2015-09-29
1   2015-10-07 14:13:03 2015-10-07
2   2015-10-07 19:17:50 2015-10-07
3   2017-12-05 14:06:42 2017-12-05
4   2017-12-06 16:36:44 2017-12-06
5   2017-12-06 18:26:49 2017-12-06
Sign up to request clarification or add additional context in comments.

4 Comments

I think, I have tried to_datetime() before but was not successful. let me check again. I will confirm the answer then.
In my case it doesn't change anything. So I created a new dataframe using datetime.datetime.now() to test your code. I have updated the post at the bottom
Thanks for your solution. One more thing - when I use this ' pd.Series(df['CreatedDate2']).dt.floor('d') ' - the output shows yy-mm-dd in datetime format, which is what I want. But when I am trying to create a new column based on that using - df['CreatedDate3'] = pd.Series(df['CreatedDate2']).dt.floor('d') - it just goes back to timestamp format. I wonder why is that happening.
I updated again. Could you explain what timestamp format is?
0
df['StartingDate2'] = pd.to_datetime(XY['StartingDate'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.