Pandas filter values which have both null and not null values in another column

Question

I have a dataframe, need to filter out a list of elements in the first column, for which in second column there are both - Null and non-null values.

["1"]   ["2"]    
"A"    "Smthng"      
"B"    "sometext"      
"C"     NULL
"A"     NULL

For this case I should get A:

["1"]   ["2"]    
"A"    "Smthng"  
"A"     NULL

I did this, and it's working. But maybe you know how to do it faster, in one-line code.

What I have done:

NamesWithMissing = df[df['2'].isna()]['1'].tolist()
NamesWithMissing = df[(df['1'].isin(NamesWithMissing)) & (df['2'].notnull())]['1'].tolist()
df[df['1'].isin(NamesWithMissing)].sort_values(by="1")

UPD

Found interesting solution:

df.groupby('1').filter(lambda g: (g.nunique() > 1).any())

Shubham Sharma · Accepted Answer · 2021-05-12 17:10:30Z

3

We can create a mask using isna then groupby this mask by column 1 and transform using nunique to check for the condition where the group contains both null and non-null values

df[df['2'].isna().groupby(df['1']).transform('nunique').eq(2)]

   1       2
0  A  Smthng
3  A     NaN

edited May 12, 2021 at 17:10

answered May 12, 2021 at 17:04

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2021-05-12 16:59:48Z

2

You can use boolean indexing:

m = (
    df.groupby("1")
    .transform(lambda x: (x.isna().sum() >= 1) & (x.notna().sum() >= 1))
    .values
)
print(df[m])

Prints:

   1       2
0  A  Smthng
3  A     NaN

answered May 12, 2021 at 16:59

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

Pandas filter values which have both null and not null values in another column

UPD

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

UPD

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related