3

I have created a pandas data frame and would like to filter the data based on certain boolean logic. Essentially what I'd like to do is closer to excels' index match function than to simple filtering. I have researched a lot of other threads.

  1. When I apply my filter, the data frame returns zero true values. Why are zero true values being returned when I have been flexible with my logic? and;

  2. If I introduced a 5th column, say column 'D', with random.randomint(100-1000,100), what logic would I use to conditionally find the maximum values only for column D ? I.e. Can I force a data frame to return the highest true values only from a certain column, in the event that multiple true values are returned?

Advice much appreciated. Thank you in advance.

import pandas as pd

df = pd.DataFrame({
    'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
    'A': [4,5,6,7,4,5,6,7,4,5,6,7],
    'B': [10,20,30,40,10,20,30,40,10,20,30,40],
    'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5]
})

columns = ['Step','A','B','C']

df=df[columns]

new_df=df[(df.Step == 1) & (df.A == 4|5|6|7) & (df.B == 10|20|30|40)]
new_df
1
  • 1
    Can you add some sample for 2. ? Do you need one largest value? Commented Jul 25, 2017 at 14:30

2 Answers 2

4

Using DataFrame.query() method:

In [7]: new_df = df.query("Step==1 and A in [4,5,6,7] and B in [10,20,30,40]")

In [8]: new_df
Out[8]:
   Step  A   B    C
0     1  4  10  0.0
1     1  5  20  0.5
2     1  6  30  1.0
3     1  7  40  1.5
4     1  4  10  2.0
5     1  5  20  2.5
Sign up to request clarification or add additional context in comments.

Comments

4

You can use boolean indexing with isin:

new_df=df[(df.Step == 1) & (df.A.isin([4,5,6,7])) & (df.B.isin([10,20,30,40]))]

It seems for second question need DataFrame.nlargest:

np.random.seed(789)
df = pd.DataFrame({
    'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
    'A': [4,5,6,7,4,5,6,7,4,5,6,7],
    'B': [10,20,30,40,10,20,30,40,10,20,30,40],
    'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5],
    'D':np.random.choice(np.arange(100,1000,100), size=12)
})
print (df)
    A   B    C    D  Step
0   4  10  0.0  400     1
1   5  20  0.5  300     1
2   6  30  1.0  200     1
3   7  40  1.5  400     1
4   4  10  2.0  500     1
5   5  20  2.5  900     1
6   6  30  0.0  500     2
7   7  40  0.5  200     2
8   4  10  1.0  900     2
9   5  20  1.5  100     2
10  6  30  2.0  200     2
11  7  40  2.5  200     2

new_df= df[(df.Step == 1)&(df.A.isin([4,5,6,7]))&(df.B.isin([10,20,30,40]))].nlargest(1,'D')
print (new_df)
   A   B    C    D  Step
5  5  20  2.5  900     1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.