1

I have a dataframe with 80 columns. Out of those, there are some 45 columns that I need to check for "all-zero" value filter. If for a row, all those 45 columns have zero as their value, then that row is marked as True. Otherwise as False.

Here's a sample dataframe for this problem:

df  = pd.DataFrame({'col1': [101,102,103,104,105,106,107,108,109,110,111],
                'col2': ['A','B','A','A','A','B','B','A','A','B','B'],
                'col3': [12e10,23e10,34e10,0,56e10,67e10,78e10,89e10,0,12e10,23e10],
                'col4': ['F','F','F','E','E','E','E','E','F','F','F'],
                'col5': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col6': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col7': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
                'col8': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col9': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col10': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col11': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col12': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col13': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col14': [12e10,0,34e10,45e10,0,67e10,0,0,0,12e10,23e10],
                'col15': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col16': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,0],
                'col17': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,0,0],
                'col18': [12e10,0,34e10,45e10,0,67e10,78e10,0,0,12e10,23e10],
                'col19': [12e10,0,34e10,0,0,67e10,78e10,0,0,12e10,23e10],
                'col20': [12e10,0,0,45e10,0,67e10,78e10,0,0,12e10,23e10],
               })

Now, in this example, I need to check the condition on all columns from col5 to col19. col3 and col20 are excluded. And all other categorical columns are excluded too.

I did this to filter out those rows.

df[(df[col5] == 0) & (df[col6] == 0) & (df[col7] == 0) & ... & (df[col19] == 0)]

This will get me rows (col1: 102,105,108,109), as they have all col5 to col19 as zero. Now I need to put True in a new column df['mark'] for these rows, and False for the rest of the rows.

But I believe there has to be some easy way to do this?

2 Answers 2

1

You can use loc indexing and then check if all entries of rows (axis=1) equals to 0 (eq(0)):

df["mark"] = df.loc[:, "col5": "col19"].eq(0).all(axis=1)

to get

      col1 col2        col3 ...        col19         col20    mark
0    101    A  1.200000e+11 ...  1.200000e+11  1.200000e+11  False
1    102    B  2.300000e+11 ...  0.000000e+00  0.000000e+00   True
2    103    A  3.400000e+11 ...  3.400000e+11  0.000000e+00  False
3    104    A  0.000000e+00 ...  0.000000e+00  4.500000e+11  False
4    105    A  5.600000e+11 ...  0.000000e+00  0.000000e+00   True
5    106    B  6.700000e+11 ...  6.700000e+11  6.700000e+11  False
6    107    B  7.800000e+11 ...  7.800000e+11  7.800000e+11  False
7    108    A  8.900000e+11 ...  0.000000e+00  0.000000e+00   True
8    109    A  0.000000e+00 ...  0.000000e+00  0.000000e+00   True
9    110    B  1.200000e+11 ...  1.200000e+11  1.200000e+11  False
10   111    B  2.300000e+11 ...  2.300000e+11  2.300000e+11  False

If the columns are not consecutive, you can write their names beforehand and pass it above e.g.:

cols_to_look_for = ["col8", "col13", "col21", "col34"]
df["mark"] = df.loc[:, cols_to_look_for].eq(0).all(axis=1)

(In the example you gave in the question, their consecutiveness allowed the slice notation.)

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Mustafa. I didn't know this eq() method. I tried isin() and all that but was stuck bad. Thanks.
@sammywemmy No Sammy. Mustafa got it right. I needed this new column 'mark'.
@sammywemmy The question reads "...Now I need to put True in a new column df['mark']...", actually.
Is there an equivalent "not equal" method as opposed to eq()?
1

I tried another way to do this using sum() and map() method to get the requisite values.

df['mark'] = np.sign(df.loc[:,'col5':'col19'].sum(axis=1)).map({-1:False,1:False,0:True})

The solution submitted by Mustafa is much faster and better approach in my case. However, this will help if instead of boolean, you want a specific indicator for negative, positive, and zero values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.