0

I have dataframe df:

         id         timestamp           data group_id   date
56729   56970   2020-02-01 01:22:52.717 21.0    1   2020-02-01
57135   57376   2020-02-01 14:11:22.633 38.0    3   2020-02-01
57136   57377   2020-02-01 14:11:22.733 39.0    3   2020-02-01
57137   57378   2020-02-01 14:11:23.637 39.0    3   2020-02-01
57138   57379   2020-02-01 14:11:23.737 40.0    3   2020-02-01

and code:

df = df[df['data'] >0]
df['timestamp'] = pd.to_datetime(df['timestamp'])

start_date = pd.to_datetime('2020-02-01 00:00:00')
end_date = pd.to_datetime('2020-03-01 00:00:00')

df = df.loc[(df['timestamp'] > start_date) & (df['timestamp'] < end_date)]

df['date'] = df['timestamp'].dt.date
df = df.sort_values(by=['date'])
df = df[df['date'] == '2020-02-01']

Column date was created based on datetime so that I can group the df by date later on. But the code returned nothing when I sliced df by a certain date, say 2020-02-01, where there is data for that day. The output looks lie this:

    id  timestamp   data    group_id    date

which is only the column names. What is wrong?

1
  • 1
    Comparing a string with a datetime object does not work. Commented Mar 13, 2020 at 4:43

2 Answers 2

1

Your df[date] columns contains datetime like values, not string, so those will not be equal to '2020-02-01', you can either do:

>>> df[df['date'] == pd.to_datetime('2020-02-01')]

Or,

>>> df[df['date'].astype(str) == '2020-02-01']
Sign up to request clarification or add additional context in comments.

Comments

1

Your df['date'] date object type data, while you are comparing it with string on line df = df[df['date'] == '2020-02-01']. Have a look on below solution:

import pandas as pd

dic = {'timestamp': ['2020-02-01 01:22:52.717', '2020-02-01 01:24:52.717', '2020-02-02 01:22:52.717',
                     '2020-02-03 01:22:52.717']}

df = pd.DataFrame(dic)

df['timestamp'] = pd.to_datetime(df['timestamp'])
print(df['timestamp'])


start_date = pd.to_datetime('2020-02-01 00:00:00')
end_date = pd.to_datetime('2020-03-01 00:00:00')

df = df.loc[(df['timestamp'] > start_date) & (df['timestamp'] < end_date)]

df['date'] = df['timestamp'].dt.date
print(df['date'])
df = df.sort_values(by=['date'])
df = df[df['date'] == pd.to_datetime('2020-02-01')]

print(df)

Output:

0   2020-02-01 01:22:52.717
1   2020-02-01 01:24:52.717
2   2020-02-02 01:22:52.717
3   2020-02-03 01:22:52.717
Name: timestamp, dtype: datetime64[ns]
0    2020-02-01
1    2020-02-01
2    2020-02-02
3    2020-02-03
Name: date, dtype: object
                timestamp        date
0 2020-02-01 01:22:52.717  2020-02-01
1 2020-02-01 01:24:52.717  2020-02-01

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.