1

Original Dataframe as below,

s1 = pd.DataFrame([1,'a',np.nan,np.nan,np.nan,2,'b',np.nan,np.nan,np.nan,3,'c',np.nan,np.nan,np.nan]).T
In [37]: s1
Out[37]: 
1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN

Desired DataFrame

Nan  1  NaN  NaN  NaN  Nan  2  NaN  NaN  NaN  Nan  3  NaN  NaN  NaN
Nan  a  NaN  NaN  NaN  Nan  b  NaN  NaN  NaN  Nan  c  NaN  NaN  NaN

My solution:

s2 =s1.shift(periods=1,axis=1)
s=pd.concat([s2,s1],axis='index',join='inner',ignore_index=True,copy=False)
print(s)
Nan 1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN
1  a  NaN  NaN  NaN  2  b  NaN  NaN  NaN  3  c  NaN  NaN  NaN

Then, how can I give each column value of NaN except that 2 rows in that column are all non-NaN? I wasted 2 hours on this small issue trying to come up a pythonic way to do it except if/else/for loop. last step will be,

s.fillna(method='ffill',axis=1,inplace=True)

Thanks in advance

3
  • so basically the data [1, 'a', NaN, NaN, NaN, Nan] means that the NaN goes for both 1 and a? That's why you want this same data displayed twice, one time with the 1 in the first row and one time with the a in the 2nd row? Commented Jan 7, 2018 at 6:28
  • sorry, I do not understand what you mean. but there are no connection between number of 'NaN' and 1 or 'a'. I just want to move digits (1, or 2, or 3) up 1 layer to be on top of 'a', 'b' and 'c' respectively. Commented Jan 7, 2018 at 6:45
  • ok, understood. And why does the desired output have a NaN as first column plus an additional NaN after 1 and 2 but not after 3? Is that intentional? Commented Jan 7, 2018 at 6:47

1 Answer 1

1

You can create mask for columns with any NaNs values and then set NaNs by loc:

s2 = s1.shift(periods=1,axis=1)
#added ignore_index=True for default unique index
s = pd.concat([s2,s1], axis='index', ignore_index=True)

m = s.isnull().any()
#alternative
#m = ~s.notnull().all()
s.loc[:, m] = np.nan
print(s)
    0  1    2    3    4    5  6    7    8    9    10 11   12   13   14
0  NaN  1  NaN  NaN  NaN  NaN  2  NaN  NaN  NaN  NaN  3  NaN  NaN  NaN
1  NaN  a  NaN  NaN  NaN  NaN  b  NaN  NaN  NaN  NaN  c  NaN  NaN  NaN

Detail:

print(s.isnull())
     0      1     2     3     4     5      6     7     8     9     10     11  \
0  True  False  True  True  True  True  False  True  True  True  True  False   
1  True  False  True  True  True  True  False  True  True  True  True  False   

     12    13    14  
0  True  True  True  
1  True  True  True  

print(m)
0      True
1     False
2      True
3      True
4      True
5      True
6     False
7      True
8      True
9      True
10     True
11    False
12     True
13     True
14     True
dtype: bool
Sign up to request clarification or add additional context in comments.

5 Comments

what does s.loc[:, m] = np.nan do?
It set NaNs by boolean mask.
yes, I got that, but the , m thingie, is that documented somewhere?
It means apply mask per columns, not default by rows. I am looking for it in docs, but I cannot find it.
@hansaplast not sure if this is what you are looking for, pandas.pydata.org/pandas-docs/stable/….

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.