0

Here is my code block:

import pandas as pd
import datetime as dt
first_day = dt.date(todays_year, todays_month, 1)

print(first_day)
>2021-02-01

print(type(first_day))
>class 'datetime.date'>

My code runs successfully as below:

df = pd.read_excel('AllServiceActivities.xlsx',
                   sheet_name='All Service Activities',
                   usecols=[7, 12, 13]).query(f'Resources.str.contains("{name} {surname}")',
                                              engine='python')

Yet, I also wanna do something like this("Scheduled Start" is my column name):

df = pd.read_excel('AllServiceActivities.xlsx',
                   sheet_name='All Service Activities',
                   usecols=[7, 12, 13]).query(f'Scheduled Start >= {first_day})',
                                              engine='python')

As you can guess it does not work.

There are solutions such like: Select DataFrame rows between two dates , but I want to use "query" method because I don' t want to pass all of the irrelevant data.

Edit(In order to generate test):

dtr = [dt.datetime(2021,1,27,12,0),
dt.datetime(2021,2,3,10,0),
dt.datetime(2021,1,25,9,0),
dt.datetime(2021,1,15,7,59),
dt.datetime(2021,1,13,10,59),
dt.datetime(2021,1,12,13,59),
dt.datetime(2021,1,11,13,59),
dt.datetime(2021,2,2,9,29),
dt.datetime(2021,1,20,7,59),
dt.datetime(2021,1,19,10,59),
dt.datetime(2021,2,1,10,0),
dt.datetime(2021,1,19,7,59),
dt.datetime(2021,1,29,7,59),
dt.datetime(2021,1,28,13,0),
dt.datetime(2021,1,28,10,59),
dt.datetime(2021,1,27,19,30),
dt.datetime(2021,1,27,13,30),
dt.datetime(2021,1,18,17,30),
dt.datetime(2021,1,19,9,0),
dt.datetime(2021,1,18,13,0),
dt.datetime(2021,2,1,14,19),
dt.datetime(2021,1,29,14,30),
dt.datetime(2021,1,14,13,0),
dt.datetime(2021,1,8,13,0),
dt.datetime(2021,1,26,10,59),
dt.datetime(2021,1,25,10,0),
dt.datetime(2021,1,23,16,0),
dt.datetime(2021,1,21,10,0),
dt.datetime(2021,1,18,10,59),
dt.datetime(2021,1,11,13,30),
dt.datetime(2021,1,20,22,0),
dt.datetime(2021,1,20,21,0),
dt.datetime(2021,1,22,19,59),
dt.datetime(2021,1,12,13,59),
dt.datetime(2021,1,21,13,59),
dt.datetime(2021,1,20,10,30),
dt.datetime(2021,1,19,16,59),
dt.datetime(2021,1,19,10,0),
dt.datetime(2021,1,14,9,29),
dt.datetime(2021,1,19,8,53),
dt.datetime(2021,1,18,10,59),
dt.datetime(2021,1,13,16,0),
dt.datetime(2021,1,13,15,0),
dt.datetime(2021,1,12,13,59),
dt.datetime(2021,1,11,10,0),
dt.datetime(2021,1,8,9,0),
dt.datetime(2021,1,7,13,0),
dt.datetime(2021,1,6,13,59),
dt.datetime(2021,1,5,12,0),
dt.datetime(2021,1,10,0,0),
dt.datetime(2020,12,8,13,0),
dt.datetime(2021,1,7,11,10),
dt.datetime(2021,1,6,8,12),
dt.datetime(2021,1,5,10,0),
dt.datetime(2021,1,5,15,15),
dt.datetime(2021,1,4,7,59)]

df1= pd.DataFrame(dtr,columns=['Scheduled Start'])
df2 = df1.query("'Scheduled Start' >= @first_day")

Thanks!

2 Answers 2

1

Without a reproducible example it's hard to know for sure. But try this. It uses the @ character for referencing variables.

df = pd.read_excel(
    'AllServiceActivities.xlsx',
    sheet_name='All Service Activities',
    usecols=[7, 12, 13]) \
      .query('Scheduled Start >= @first_day)')
Sign up to request clarification or add additional context in comments.

4 Comments

I didn' t work. I added some of my data so you can generate some example. Thanks for help:).
When I tried this: > df2 = df1.query("'Scheduled Start' >= @first_day", engine='python') I got this: > TypeError: '>=' not supported between instances of 'str' and 'datetime.date' So, then I tried this: > df2 = df1.query("datetime.strptime('Scheduled Start', '%Y-%m-%d %H:%M) >= @first_day", engine='python') I got this: > ValueError: time data 'Scheduled Start' does not match format '%Y-%m-%d %H:%M' Pandas evaluate datatime data type as string. I cannot change it because it' s pipelined via Excel.
Also, I tried out this: > df2 = df1.query("datetime.strptime('Scheduled Start', '%Y-%m-%d %H:%M:%S') >= @first_day", engine='python') gave the same error. In order to check my string format used this: > for i in dtr: print(str(i)) print(dt.datetime.strptime(str(i),'%Y-%m-%d %H:%M:%S')) and it has just worked fine.
in the read_excel() function you need to tell pandas to import you 'Scheduled Start' column as a datetime. Something like parse_dates=[x] where x is your column number with the date (7, 12, 0r 13 from your example above). See the docs. You're getting the error because pandas doesn'y yet know that your scheduled start column is dates, so it can't compare it to other dates.
0

Firstly, thanks for your guiding me @mullinscr.

From here got extra information about date_parser and parse_dates:

https://www.programcreek.com/python/example/101346/pandas.read_excel

date_parser is a specific parser function for my cases.

date_parser = lambda x: pd.datetime.strptime(str(x).split(".")[0], "%Y-%m-%d %H:%M:%S") if str(x).__contains__(".") else (pd.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S") if not str(x).__contains__("1899") else None)


df = pd.read_excel('AllServiceActivities.xlsx', sheet_name='All Service Activities', header=None, names=["Resources", "Start", "End"], skiprows=1, usecols=[7, 12, 13], parse_dates=[1], date_parser=date_parser).query("Start >= @first_day and End <= @last_day and Resources.str.contains('{} {}')".format(name, surname), engine='python')

Hope helps everyone :).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.