filtering a dataframe using another dataframe

Question

data = {'a':['a','b','c','d','e','f','g'],
        'b':['Y','N','Y','Y','Y','N','Y'],
        'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Unqualified','Qualified']}
df = pd.DataFrame(data)

df_para = {'Y/N':['','y','n'],
        'Q/U':['unqualified','','unqualified']}
df_para = pd.DataFrame(df_para)

I would like to filter the df using df_para, My code is:

df_output = pd.DataFrame()

for para in df_para.iterrows():
    df_result = df
     # filter Q/U
    if '' not in df_para['Q/U']:
        mask_qu = df_result['c'].str.lower().isin(df_para['Q/U'])
        df_result = df_result.loc[(mask_qu)]
        
    # filter Y/N
    if '' not in df_para['Y/N']:
        mask_yn = df_result['b'].str.lower().isin(df_para['Y/N'])
        df_result = df_result.loc[(mask_yn)]

    df_output = df_output.append(df_result)

If I use my code, it returns all rows within df three times. However, the df_output should be like：

   a   b   c
1   b   N   Unqualified
3   d   Y   Unqualified
5   f   N   Unqualified
0   a   Y   Qualified
2   c   Y   Qualified
3   d   Y   Unqualified
4   e   Y   Qualified
6   g   Y   Qualified
1   b   N   Unqualified
5   f   N   Unqualified

How could I fix it?

@jezrael do you mean if df_para = {'Y/N':[np.nan,np.nan], 'Q/U':['unqualified','unqualified']}? — yiyang chen
– yiyang chen, Commented Jun 23, 2022 at 6:07
@yiyangchen - yes, exactly. Then ouput is same like {'Y/N':[np.nan], 'Q/U':['unqualified']} ? — jezrael
– jezrael, Commented Jun 23, 2022 at 6:12

jezrael · Accepted Answer · 2022-06-23 07:46:58Z

1

Reason is in operator test indices:

Using the Python in operator on a Series tests for membership in the index, not membership among the values.

If this behavior is surprising, keep in mind that using in on a Python dictionary tests keys, not values, and Series are dict-like.

#pairs for filtering
cols = [('c','Q/U'), ('b','Y/N')]

#for each unique value in df_para filter rows in list
dfs = [df[df[a].str.lower().eq(x)] for a, b in cols for x in df_para[b].unique()]

#join subDataFrames
df_out = pd.concat(dfs)
print (df_out)
   a  b            c
1  b  N  Unqualified
3  d  Y  Unqualified
5  f  N  Unqualified
0  a  Y    Qualified
2  c  Y    Qualified
3  d  Y  Unqualified
4  e  Y    Qualified
6  g  Y    Qualified
1  b  N  Unqualified
5  f  N  Unqualified

edited Jun 23, 2022 at 7:46

answered Jun 23, 2022 at 6:02

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

yiyang chen Over a year ago

I see how it works, but the actual df_para has more than 20 columns, if I use boolean indexing, the code would be so reduntant. Ideally, for every iteration, if a given cell is empty or NaN, it should pass to the next filtering condition.

jezrael Over a year ago

@yiyangchen - So need test 20 columns rom df_result by 20 columns from df_para ?

yiyang chen Over a year ago

I tried using your updated method, but the actual df has more than 100000 rows and df_para has more than 20 columns. I have updated my posts, could you please check one more time? I am new to pandas, sorry about that.

jezrael Over a year ago

@yiyangchen - is possible first filter by first column, then by second, column... Like edited answer?

yiyang chen Over a year ago

It is possible, but why the row index 1,3,6 return two times? I only need them once. Other than that, it is perfect:)

|

maya · Accepted Answer · 2022-06-23 06:11:35Z

0

try this：

import pandas as pd
import numpy as np

data = {'a':['a','b','c','d','e','f','g'],
        'b':['Y','N','Y','Y','Y','N','Y'],
        'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Qualified','Unqualified']}
df = pd.DataFrame(data)


df_result = df[df["c"] == "Unqualified"]
print(df_result)
print(type(df_result))

answered Jun 23, 2022 at 6:11

maya

1,0901 gold badge5 silver badges9 bronze badges

Collectives™ on Stack Overflow

filtering a dataframe using another dataframe

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related