1

I've encountered a situation where I need to filter a dataframe via input(s) that are found in columns P1-P5 below. There could be anywhere from 1 to 5 inputs and they could be located in any of P1-P5.

   TeamAbb    P1       P2      P3      P4      P5
0     ATL1  203953  1627745 1629027 1629629 1629631
1     ATL2  203953  1627745 1627761 1629027 1629631
2     ATL3  203458  203953  1627761 1629027 1629631
3     ATL4  203458  203953  1629027 1629629 1629631
4     ATL5  203458  1628381 1629027 1629629 1629631
5     ATL6  203953  1628981 1628989 1629027 1629631
6     ATL7  203953  1627745 1628989 1629027 1629631
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 1

input_val = [1713]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 2

input_val = [1713,202323]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981

So far each method I've tried hasn't worked (query, apply/any and mask). If anyone has ideas on how to approach this I'd really apreciate it.

3 Answers 3

1

You can sum all the boolean dataframes equal (eq) to each value in your list and then sum the resulting summed dataframe over the axis=1 and then check if it is the same value as the length of your input list:

input_val = [1713,202323]
mask = sum([df.eq(i) for i in input_val]).sum(1).eq(len(input_val))

print (df[mask])
  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981
Sign up to request clarification or add additional context in comments.

1 Comment

Oh ya I know, was just rushing through the examples. Thanks for the help.
0

I would use an apply across the rows and check the difference of the input and row sets:

input_val = [1713,202323]
df_filter = (
             df[['P{}'.format(i) for i in range(1,6)]]
             .apply(lambda row: len(set(input_val) - set(row)) == 0 # check that all input vals are found somewhere in the row
                    , axis=1)
            ) 
df_new = df[df_filter] # apply the filter

Comments

0

I think this acheives the result your looking for:

ee = df
ee = e.isin([1713]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()] 

output:

  TeamAbb    P1      P2       P3       P4       P5
7    ATL8  1713  202323   203459  1627761  1628981
8    ATL9  1713  203459  1628981  1629027  1629631

Input:

ee = df
ee = e.isin([1713, 202323]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()] 

output:

  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981

input

ee = df 
ee = e.isin([203953,1628989])  
ee ['match'] = ee[ee>0].count(axis=1)  
df.loc[ee['match']==ee['match'].max()]   

output

  TeamAbb      P1       P2       P3       P4       P5
5    ATL6  203953  1628981  1628989  1629027  1629631
6    ATL7  203953  1627745  1628989  1629027  1629631

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.