1

I am trying to delete the columns where the non zeros is less than a said number.This is the code I got but it is giving the same answer.What am I doing wrong?

 df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])



   0  1  2  3
0  1  0  0  0
1  0  0  1  0

df = df.loc[:, (df.astype(bool).sum(axis=0) <= max_number_of_zeros)]

   0  1  2  3
0  1  0  0  0
1  0  0  1  0

edit-- example-

   0  1  2  3
0  1  0  0  0
1  2  0  1  0
2  0  2  3  4
3  1  1  1  1

output would be for value=2 the columns 0 and column 2

 0  1  2  3
0  1  0  0  0
1  2  0  1  0
2  0  2  3  4
3  1  1  1  1

1 Answer 1

1

I think you need to change the boolean mask to df.eq(0) which is the same as df == 0 with changed condition from <= to <:

max_number_of_zeros = 2
df  = df.loc[:,df.eq(0).sum(axis=0) < max_number_of_zeros]
print (df)
   0  2
0  1  0
1  2  1
2  0  3
3  1  1

Detail:

print (df.eq(0))
       0      1      2      3
0  False   True   True   True
1  False   True  False   True
2   True  False  False  False
3  False  False  False  False

print (df.eq(0).sum(axis=0))
0    1
1    2
2    1
3    2
dtype: int64

EDIT:

max_number_of_zeros = 2
df  = df.loc[:,len(df.columns) - df.astype(bool).sum(axis=0) < max_number_of_zeros]
print (df)
   0  2
0  1  0
1  2  1
2  0  3
3  1  1
Sign up to request clarification or add additional context in comments.

10 Comments

shouldnt the answer be col1 and co2 ?
@ubuntu_noob - Sorry, solution was multiple times edited, now it should working nice.
for no of zeros=2 the df doesnt change
Yes, it is expected, because want remove columns with more as 2 values of zeros, so it means 3 and more
I want to remove the columns whch have more than or equal to 2 zeros
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.