5

I want to drop rows with zero value in specific columns

>>> df

   salary   age   gender
0   10000    23     1
1   15000    34     0
2   23000    21     1
3     0      20     0
4   28500     0     1
5   35000    37     1

some data in columns salary and age are missing and the third column, gender is a binary variables, which 1 means male 0 means female. And 0 here is not a missing data, I want to drop the row in either salary or age is missing so I can get

>>> df
   salary   age   gender
0   10000    23     1
1   15000    34     0
2   23000    21     1
3   35000    37     1
4
  • 1
    df = df[(df['salary'] > 0) & (df['age'] > 0)] Commented Apr 15, 2018 at 12:41
  • thanks you for editing the format for me Commented Apr 15, 2018 at 12:45
  • this is my first time asking a question on this forum after I posted this question I found the format is terrible And you edited it before I did Thanks alot Commented Apr 15, 2018 at 12:46
  • df = df[(df['Patient_Sub_Market'] != '0') & (df['Patient_Zip_and_City'] != '0')] I got rid of the rows that had zeros using the above statement. Commented Apr 14, 2023 at 18:27

1 Answer 1

15

Option 1

You can filter your dataframe using pd.DataFrame.loc:

df = df.loc[~((df['salary'] == 0) | (df['age'] == 0))]

Option 2

Or a smarter way to implement your logic:

df = df.loc[df['salary'] * df['age'] != 0]

This works because if either salary or age are 0, their product will also be 0.

Option 3

The following method can be easily extended to several columns:

df.loc[(df[['a', 'b']] != 0).all(axis=1)]

Explanation

  • In all 3 cases, Boolean arrays are generated which are used to index your dataframe.
  • All these methods can be further optimised by using numpy representation, e.g. df['salary'].values.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.