Query dataframe columns using multiple variables/inputs

Question

I've encountered a situation where I need to filter a dataframe via input(s) that are found in columns P1-P5 below. There could be anywhere from 1 to 5 inputs and they could be located in any of P1-P5.

   TeamAbb    P1       P2      P3      P4      P5
0     ATL1  203953  1627745 1629027 1629629 1629631
1     ATL2  203953  1627745 1627761 1629027 1629631
2     ATL3  203458  203953  1627761 1629027 1629631
3     ATL4  203458  203953  1629027 1629629 1629631
4     ATL5  203458  1628381 1629027 1629629 1629631
5     ATL6  203953  1628981 1628989 1629027 1629631
6     ATL7  203953  1627745 1628989 1629027 1629631
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 1

input_val = [1713]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 2

input_val = [1713,202323]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981

So far each method I've tried hasn't worked (query, apply/any and mask). If anyone has ideas on how to approach this I'd really apreciate it.

Ben.T · Accepted Answer · 2019-12-03 22:56:57Z

1

You can sum all the boolean dataframes equal (eq) to each value in your list and then sum the resulting summed dataframe over the axis=1 and then check if it is the same value as the length of your input list:

input_val = [1713,202323]
mask = sum([df.eq(i) for i in input_val]).sum(1).eq(len(input_val))

print (df[mask])
  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981

edited Dec 3, 2019 at 22:56

answered Dec 3, 2019 at 22:49

Ben.T

29.7k6 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nick Over a year ago

Oh ya I know, was just rushing through the examples. Thanks for the help.

Brandon · Accepted Answer · 2019-12-04 01:33:41Z

0

I would use an apply across the rows and check the difference of the input and row sets:

input_val = [1713,202323]
df_filter = (
             df[['P{}'.format(i) for i in range(1,6)]]
             .apply(lambda row: len(set(input_val) - set(row)) == 0 # check that all input vals are found somewhere in the row
                    , axis=1)
            ) 
df_new = df[df_filter] # apply the filter

answered Dec 4, 2019 at 1:33

Brandon

1,0187 silver badges14 bronze badges

Comments

oppressionslayer · Accepted Answer · 2019-12-04 02:09:56Z

I think this acheives the result your looking for:

ee = df
ee = e.isin([1713]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()]

output:

  TeamAbb    P1      P2       P3       P4       P5
7    ATL8  1713  202323   203459  1627761  1628981
8    ATL9  1713  203459  1628981  1629027  1629631

Input:

ee = df
ee = e.isin([1713, 202323]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()]

output:

  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981

input

ee = df 
ee = e.isin([203953,1628989])  
ee ['match'] = ee[ee>0].count(axis=1)  
df.loc[ee['match']==ee['match'].max()]

output

  TeamAbb      P1       P2       P3       P4       P5
5    ATL6  203953  1628981  1628989  1629027  1629631
6    ATL7  203953  1627745  1628989  1629027  1629631

Collectives™ on Stack Overflow

Query dataframe columns using multiple variables/inputs

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related