1

I'm trying to apply this function to a pandas data frame in order to see if a taxi pickup or dropoff time falls within the range that I created using the arrivemin, arrive max variable below.

If the the time does fall into the range, I want to keep the row. If it's outside the range I want to drop it from the dataframe.

Start.Time, End.Time etc are all datetime objects so the time functionality should work fine.

def time_function(df, row):
    gametimestart = df['Start.Time'] 
    gametimeend = df['End.Time'] 
    arrivemin = gametimestart - datetime.timedelta(minutes=120) 
    arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
    departmin = gametimeend - datetime.timedelta(minutes = 60) 
    departmax = gametimeend + datetime.timedelta(minutes = 90)
    for not i in ((df['pickup_datetime'] > arrivemin) & (df['pickupdatetime'] < arrivemax) &(df['dropoff_datetime'] > departmin) & (df['dropoffdatetime'] < departmax)):
        df = df.drop[df[i.index]]
    return


for index, row in yankdf:
    time_function(yankdf, row)

Keep getting this syntax error:

 File "<ipython-input-25-bda6fb2db429>", line 17
    for not i in (((row['pickup_datetime'] > arrivemin) & (row['pickupdatetime'] < arrivemax)) | ((row['dropoff_datetime'] > departmin) & (row['dropoffdatetime'] < departmax)):
          ^
SyntaxError: invalid syntax
4
  • Well, your indenting is wrong for the body of time_function. Could that be it? Commented Nov 20, 2015 at 18:25
  • whoops... that was just a copy paste error. Edited above Commented Nov 20, 2015 at 18:27
  • Please edit your question and include the full error traceback. Commented Nov 20, 2015 at 18:28
  • interesting... my error message was on duplicate lines, it appears the error is with the "not" and not the departmin Commented Nov 20, 2015 at 18:30

1 Answer 1

1

I don't think you need the function. Just perform a basic subset and df_filtered should be your filtered dataframe.

gametimestart = df['Start.Time'] 
gametimeend = df['End.Time'] 
arrivemin = gametimestart - datetime.timedelta(minutes=120) 
arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
departmin = gametimeend - datetime.timedelta(minutes = 60) 
departmax = gametimeend + datetime.timedelta(minutes = 90)
df_filtered = df[(df['pickup_datetime'] > arrivemin) &
                 (df['pickup_datetime'] < arrivemax) &
                 (df['dropoff_datetime'] > departmin) & 
                 (df['dropoffdatetime'] < departmax)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.