2

I wanted to perform following operation on dataframe ata time

opr1 = df[df != ''] 
opr2 = df.dropna(axis=1, how='all', inplace= True) 
opr3 = df.dropna(axis=0, how='all', inplace= True)

1 Answer 1

2

You can use indexing for set 0 by mask:

df[df['A'] == 0] = 0
print (df)
   A  B  C  D
0  0  0  0  0
1  6  7  2  8
2  2  8  6  3
3  0  0  0  0

Alternative is use DataFrame.mask:

df = df.mask(df['A'] == 0, 0)

If performance is important use DataFrame constructor and numpy.where:

df = pd.DataFrame(np.where(df['A'].to_numpy()[:, None] == 0, 0, df), 
                  index=df.index, 
                  columns=df.columns)
print (df)
   A  B  C  D
0  0  0  0  0
1  6  7  2  8
2  2  8  6  3
3  0  0  0  0

Performance in sample data for 10k rows, 4 columns, 50% matched data:

#10k rows
df = pd.concat([df] * 2500, ignore_index=True)


In [101]: %timeit df[df['A'] == 0] = 0
465 µs ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [103]: %timeit df.mask(df['A'] == 0, 0)
2.56 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [105]: %timeit pd.DataFrame(np.where(df['A'].to_numpy()[:, None] == 0, 0, df),  index=df.index, columns=df.columns)
123 µs ± 666 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.