4

looking for some way to filter my Data frame by few criteria's (Dataframe for example:

id  Arrest  Shift_num  Description
0   True    20         Weapon
1   False   25         unarmed
2   True    30         Weapon 

I would like to get DF with: Description == Weapon and shift_num >= 25 and arrest == True (for example)

after few tries , that was my way, but i think it can be better than this :

arrest=(df.Arrest == True)
shift=(df.Shift_num >= 25)
weap= (df['Description'] == 'weapon')

print(df[arrest & shift & weap])

Thanks in advance :)

4
  • 2
    df[df['Arrest'].eq(True) & df['Shift_num'].ge(25) & df['Description'].eq('weapon')] Commented Dec 4, 2019 at 18:32
  • 1
    Yours looks good to me. Commented Dec 4, 2019 at 18:32
  • 3
    Your solution is typically how it's done in Pandas. It may be slightly better do apply all three masks at once rather than storing each as a variable. Commented Dec 4, 2019 at 18:32
  • See Boolean Indexing in the documentation: pandas.pydata.org/pandas-docs/stable/user_guide/… Commented Dec 4, 2019 at 18:53

3 Answers 3

4

You can use df.query (a bonus: it uses numexpr which is very optimized!):

import pandas as pd

df = pd.DataFrame({"Arrest": [True, False, True], 
                   "Shift_num": [20, 25, 30], 
                   "Description": ["Weapon", "unarmed", "Weapon"]})

df.query("Arrest & Shift_num >= 25 & Description == 'Weapon'")

Output:

   Arrest  Shift_num Description
2    True         30      Weapon

Some notes:

  • Don't forget to 'quote' strings
  • The variable names that can be used are from the DataFrame scope (without needing to prefix using df)
  • Use ~Arrest when you want NOT arrested
  • You can use @ to refer to a variable in the scope (i.e. not in the df)

I encouraged you to read about numexpr.

Sign up to request clarification or add additional context in comments.

2 Comments

Amazing,Thank you very much :))))Appreciate it
@MaxBoyar You're welcome! Feel free to accept the answer if it solved your problem.
1

You can try slicing:

df = pd.DataFrame({'Arrest':[True,False,True],'Shift_num':[20,25,30],'Description':['Weapon','unarmed','Weapon']})

df.loc[(df['Description'] == 'Weapon') & (df['Shift_num'] > 25) & (df['Arrest'] == True)]

Comments

1

What you've got works. Here is a one liner that may be slightly more efficient. Since Arrest is a boolean field, you can evaluate it directly instead of using the == True.

In [5]: df[(df.Description == 'Weapon') & (df.Shift_num >= 25) & (df.Arrest)] 
Out[5]: 
   id  Arrest  Shift_num Description
2   2    True         30      Weapon

1 Comment

Thanks,Worked for me ,appreciate it :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.