Dataframe Boolean Logic Index Match

Question

I have created a pandas data frame and would like to filter the data based on certain boolean logic. Essentially what I'd like to do is closer to excels' index match function than to simple filtering. I have researched a lot of other threads.

When I apply my filter, the data frame returns zero true values. Why are zero true values being returned when I have been flexible with my logic? and;
If I introduced a 5th column, say column 'D', with random.randomint(100-1000,100), what logic would I use to conditionally find the maximum values only for column D ? I.e. Can I force a data frame to return the highest true values only from a certain column, in the event that multiple true values are returned?

Advice much appreciated. Thank you in advance.

import pandas as pd

df = pd.DataFrame({
    'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
    'A': [4,5,6,7,4,5,6,7,4,5,6,7],
    'B': [10,20,30,40,10,20,30,40,10,20,30,40],
    'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5]
})

columns = ['Step','A','B','C']

df=df[columns]

new_df=df[(df.Step == 1) & (df.A == 4|5|6|7) & (df.B == 10|20|30|40)]
new_df

Can you add some sample for 2. ? Do you need one largest value? — jezrael
– jezrael, Commented Jul 25, 2017 at 14:30

MaxU - stand with Ukraine · Accepted Answer · 2017-07-25 14:25:15Z

4

Using DataFrame.query() method:

In [7]: new_df = df.query("Step==1 and A in [4,5,6,7] and B in [10,20,30,40]")

In [8]: new_df
Out[8]:
   Step  A   B    C
0     1  4  10  0.0
1     1  5  20  0.5
2     1  6  30  1.0
3     1  7  40  1.5
4     1  4  10  2.0
5     1  5  20  2.5

answered Jul 25, 2017 at 14:25

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2017-07-25 14:36:38Z

You can use boolean indexing with isin:

new_df=df[(df.Step == 1) & (df.A.isin([4,5,6,7])) & (df.B.isin([10,20,30,40]))]

It seems for second question need DataFrame.nlargest:

np.random.seed(789)
df = pd.DataFrame({
    'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
    'A': [4,5,6,7,4,5,6,7,4,5,6,7],
    'B': [10,20,30,40,10,20,30,40,10,20,30,40],
    'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5],
    'D':np.random.choice(np.arange(100,1000,100), size=12)
})
print (df)
    A   B    C    D  Step
0   4  10  0.0  400     1
1   5  20  0.5  300     1
2   6  30  1.0  200     1
3   7  40  1.5  400     1
4   4  10  2.0  500     1
5   5  20  2.5  900     1
6   6  30  0.0  500     2
7   7  40  0.5  200     2
8   4  10  1.0  900     2
9   5  20  1.5  100     2
10  6  30  2.0  200     2
11  7  40  2.5  200     2

new_df= df[(df.Step == 1)&(df.A.isin([4,5,6,7]))&(df.B.isin([10,20,30,40]))].nlargest(1,'D')
print (new_df)
   A   B    C    D  Step
5  5  20  2.5  900     1

Collectives™ on Stack Overflow

Dataframe Boolean Logic Index Match

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related