1

I have a dataframe, need to filter out a list of elements in the first column, for which in second column there are both - Null and non-null values.

["1"]   ["2"]    
"A"    "Smthng"      
"B"    "sometext"      
"C"     NULL
"A"     NULL         

For this case I should get A:

["1"]   ["2"]    
"A"    "Smthng"  
"A"     NULL         

I did this, and it's working. But maybe you know how to do it faster, in one-line code.

What I have done:

NamesWithMissing = df[df['2'].isna()]['1'].tolist()
NamesWithMissing = df[(df['1'].isin(NamesWithMissing)) & (df['2'].notnull())]['1'].tolist()
df[df['1'].isin(NamesWithMissing)].sort_values(by="1")

UPD

Found interesting solution:

df.groupby('1').filter(lambda g: (g.nunique() > 1).any())

2 Answers 2

3

We can create a mask using isna then groupby this mask by column 1 and transform using nunique to check for the condition where the group contains both null and non-null values

df[df['2'].isna().groupby(df['1']).transform('nunique').eq(2)]

   1       2
0  A  Smthng
3  A     NaN
Sign up to request clarification or add additional context in comments.

Comments

2

You can use boolean indexing:

m = (
    df.groupby("1")
    .transform(lambda x: (x.isna().sum() >= 1) & (x.notna().sum() >= 1))
    .values
)
print(df[m])

Prints:

   1       2
0  A  Smthng
3  A     NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.