Setting nan to rows in pandas dataframe based on column value

Question

Using:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

a = pd.read_csv('file.csv', na_values=['-9999.0'], decimal=',')
a.index = pd.to_datetime(a[['Year', 'Month', 'Day', 'Hour', 'Minute']])
pd.options.mode.chained_assignment = None

The dataframe is something like:

Index               A    B       C      D
2016-07-20 18:00:00 9   4.0     NaN    2
2016-07-20 19:00:00 9   2.64    0.0    3
2016-07-20 20:00:00 12  2.59    0.0    1
2016-07-20 21:00:00 9   4.0     NaN    2

The main objective is to set np.nan to the entire row if the value on A column is 9 and on D column is 2 at the same time, for exemple:

Output expectation

Index               A    B       C      D
2016-07-20 18:00:00 NaN NaN     NaN    NaN
2016-07-20 19:00:00 9   2.64    0.0     3
2016-07-20 20:00:00 12  2.59    0.0     2
2016-07-20 21:00:00 NaN NaN     NaN    NaN

Would be thankful if someone could help.

piRSquared · Accepted Answer · 2017-08-15 14:28:47Z

4

Option 1
This is the opposite of @Jezrael's mask solution.

a.where(a.A.ne(9) | a.D.ne(2))

                        A     B    C    D
Index                                    
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

Option 2
pd.DataFrame.reindex

a[a.A.ne(9) | a.D.ne(2)].reindex(a.index)

                        A     B    C    D
Index                                    
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

answered Aug 15, 2017 at 14:28

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MaxU - stand with Ukraine · Accepted Answer · 2017-08-15 14:29:00Z

4

Try this:

df.loc[df.A.eq(9) & df.D.eq(2)] = [np.nan] * len(df.columns)

Demo:

In [158]: df
Out[158]:
                      A     B    C  D
Index
2016-07-20 18:00:00   9  4.00  NaN  2
2016-07-20 19:00:00   9  2.64  0.0  3
2016-07-20 20:00:00  12  2.59  0.0  1
2016-07-20 21:00:00   9  4.00  NaN  2

In [159]: df.loc[df.A.eq(9) & df.D.eq(2)] = [np.nan] * len(df.columns)

In [160]: df
Out[160]:
                        A     B    C    D
Index
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

alternatively we can use DataFrame.where() method:

In [174]: df = df.where(~(df.A.eq(9) & df.D.eq(2)))

In [175]: df
Out[175]:
                        A     B    C    D
Index
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

edited Aug 15, 2017 at 14:29

answered Aug 15, 2017 at 14:20

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

9 Comments

jezrael Over a year ago

I get ValueError: cannot set using a list-like indexer with a different length than the value for first solution :(

MaxU - stand with Ukraine Over a year ago

@jezrael, can you provide a sample data set to reproduce this error?

MaxU - stand with Ukraine Over a year ago

@jezrael, i can't reproduce it

MaxU - stand with Ukraine Over a year ago

@jezrael, pandas: 0.20.1

jezrael Over a year ago

Hmmm, ok. After change answer I can add your solution to timings. thanks.

|

jezrael · Accepted Answer · 2017-08-16 11:00:26Z

Use mask, which create NaNs by default:

df = a.mask((a['A'] == 9) & (a['D'] == 2))
print (df)
                        A     B    C    D
Index                                    
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

Or boolean indexing with assign NaN:

a[(a['A'] == 9) & (a['D'] == 2)] = np.nan
print (a)
                        A     B    C    D
Index                                    
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

Timings:

np.random.seed(123)
N = 1000000
L = list('abcdefghijklmnopqrst'.upper())

a = pd.DataFrame(np.random.choice([np.nan,2,9], size=(N,20)), columns=L)

#jez2
In [256]: %timeit a[(a['A'] == 9) & (a['D'] == 2)] = np.nan
10 loops, best of 3: 25.8 ms per loop

#jez2upr
In [257]: %timeit a.loc[(a['A'] == 9) & (a['D'] == 2)] = np.nan
10 loops, best of 3: 27.6 ms per loop

#Wen
In [258]: %timeit a.mul(np.where((a.A==9)&(a.D==2),np.nan,1),0)
10 loops, best of 3: 90.5 ms per loop

#jez1
In [259]: %timeit a.mask((a['A'] == 9) & (a['D'] == 2))
1 loop, best of 3: 316 ms per loop

#maxu2
In [260]: %timeit a.where(~(a.A.eq(9) & a.D.eq(2)))
1 loop, best of 3: 318 ms per loop

#pir1
In [261]: %timeit a.where(a.A.ne(9) | a.D.ne(2))
1 loop, best of 3: 316 ms per loop

#pir2
In [263]: %timeit a[a.A.ne(9) | a.D.ne(2)].reindex(a.index)
1 loop, best of 3: 355 ms per loop

BENY · Accepted Answer · 2017-08-15 15:00:54Z

2

Or you can try using.mul after np.where

a=np.where((df2.A==9)&(df2.D==2),np.nan,1)
df2.mul(a,0)
#one line df.mul(np.where((df.A==9)&(df.D==2),np.nan,1))

                        A     B    C    D
Index                                    
2016-07-20 18:00:00   NaN   NaN  NaN  NaN
2016-07-20 19:00:00   9.0  2.64  0.0  3.0
2016-07-20 20:00:00  12.0  2.59  0.0  1.0
2016-07-20 21:00:00   NaN   NaN  NaN  NaN

edited Aug 15, 2017 at 15:00

answered Aug 15, 2017 at 14:43

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

piRSquared Over a year ago

This is clever (-:

MaxU - stand with Ukraine Over a year ago

yes, indeed, it's a smart option! We can make a one-liner out of it: df.mul(np.where((df.A==9)&(df.D==2),np.nan,1))

BENY Over a year ago

@MaxU thank you , Yes, you are right online more neat ~

Collectives™ on Stack Overflow

Setting nan to rows in pandas dataframe based on column value

4 Answers 4

Comments

9 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

9 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related