binary operation between rows in DataFrame

Question

Original Dataframe as below,

s1 = pd.DataFrame([1,'a',np.nan,np.nan,np.nan,2,'b',np.nan,np.nan,np.nan,3,'c',np.nan,np.nan,np.nan]).T
In [37]: s1
Out[37]: 
1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN

Desired DataFrame

Nan  1  NaN  NaN  NaN  Nan  2  NaN  NaN  NaN  Nan  3  NaN  NaN  NaN
Nan  a  NaN  NaN  NaN  Nan  b  NaN  NaN  NaN  Nan  c  NaN  NaN  NaN

My solution:

s2 =s1.shift(periods=1,axis=1)
s=pd.concat([s2,s1],axis='index',join='inner',ignore_index=True,copy=False)
print(s)
Nan 1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN
1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN

Then, how can I give each column value of NaN except that 2 rows in that column are all non-NaN? I wasted 2 hours on this small issue trying to come up a pythonic way to do it except if/else/for loop. last step will be,

s.fillna(method='ffill',axis=1,inplace=True)

Thanks in advance

so basically the data [1, 'a', NaN, NaN, NaN, Nan] means that the NaN goes for both 1 and a? That's why you want this same data displayed twice, one time with the 1 in the first row and one time with the a in the 2nd row? — hansaplast
– hansaplast, Commented Jan 7, 2018 at 6:28
sorry, I do not understand what you mean. but there are no connection between number of 'NaN' and 1 or 'a'. I just want to move digits (1, or 2, or 3) up 1 layer to be on top of 'a', 'b' and 'c' respectively. — Yan Tian
– Yan Tian, Commented Jan 7, 2018 at 6:45
ok, understood. And why does the desired output have a NaN as first column plus an additional NaN after 1 and 2 but not after 3? Is that intentional? — hansaplast
– hansaplast, Commented Jan 7, 2018 at 6:47

jezrael · Accepted Answer · 2018-01-07 06:59:20Z

1

You can create mask for columns with any NaNs values and then set NaNs by loc:

s2 = s1.shift(periods=1,axis=1)
#added ignore_index=True for default unique index
s = pd.concat([s2,s1], axis='index', ignore_index=True)

m = s.isnull().any()
#alternative
#m = ~s.notnull().all()
s.loc[:, m] = np.nan
print(s)
    0  1    2    3    4    5  6    7    8    9    10 11   12   13   14
0  NaN  1  NaN  NaN  NaN  NaN  2  NaN  NaN  NaN  NaN  3  NaN  NaN  NaN
1  NaN  a  NaN  NaN  NaN  NaN  b  NaN  NaN  NaN  NaN  c  NaN  NaN  NaN

Detail:

print(s.isnull())
     0      1     2     3     4     5      6     7     8     9     10     11  \
0  True  False  True  True  True  True  False  True  True  True  True  False   
1  True  False  True  True  True  True  False  True  True  True  True  False   

     12    13    14  
0  True  True  True  
1  True  True  True  

print(m)
0      True
1     False
2      True
3      True
4      True
5      True
6     False
7      True
8      True
9      True
10     True
11    False
12     True
13     True
14     True
dtype: bool

edited Jan 7, 2018 at 6:59

answered Jan 7, 2018 at 6:54

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

hansaplast Over a year ago

what does s.loc[:, m] = np.nan do?

jezrael Over a year ago

It set NaNs by boolean mask.

hansaplast Over a year ago

yes, I got that, but the , m thingie, is that documented somewhere?

jezrael Over a year ago

It means apply mask per columns, not default by rows. I am looking for it in docs, but I cannot find it.

Yan Tian Over a year ago

@hansaplast not sure if this is what you are looking for, pandas.pydata.org/pandas-docs/stable/….

Collectives™ on Stack Overflow

binary operation between rows in DataFrame

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related