1

I am starting to dig deeper into Python and am having trouble converting some of my R scripts into Python. I have a function defined in R:

Shft_Rw <- function(x) { for (row in 1:nrow(x))
{
  new_row = x[row , c(which(!is.na(x[row, ])), which(is.na( x[row, ])))]
  colnames(new_row) = colnames(x)
  x[row, ] = new_row
}
  return(x)  
}

Which essentially takes leading NA's of each row in a dataframe and puts them at the end of the row i.e.

import pandas as pd
import numpy as np
df =pd.DataFrame({'a':[np.nan,np.nan,3],'b':[3,np.nan,5],'c':[3, 4,5]})

df
Out[156]: 
     a    b  c
0  NaN  3.0  3
1  NaN  NaN  4
2  3.0  5.0  5

turns into:

df2 =pd.DataFrame({'a':[3,4,3],'b':[3,np.nan,5],'c':[np.nan, np.nan,5]})
df2
Out[157]: 
   a    b    c
0  3  3.0  NaN
1  4  NaN  NaN
2  3  5.0  5.0

So far I have:

def Shft_Rw(x):
    for row in np.arange(0,x.shape[0]):
        new_row = x.iloc[row,[np.where(pd.notnull(x.iloc[row])),np.where(pd.isnull(df.iloc[row]))]]

But throwing errors. Using sample df above I can get a row index using iloc and the column positions where it is null/not null (using where()) but can't put the two together (tried numerous variations with more brackets etc.).

df.iloc[1]
Out[170]: 
a    NaN
b    NaN
c    4.0

np.where(pd.isnull(df.iloc[1]))
In[167] :  np.where(pd.isnull(df.iloc[1]))
Out[167]: (array([0, 1], dtype=int64),)

df.iloc[1,np.where(pd.notnull(df.iloc[1]))]

Anyone able to help replicate the function AND/OR show a more efficient way to solve the problem?

Thanks!

2
  • What should happen with a row such as "2 NaN 3"? Is the expected output "2 NaN 3" or "3 2 NaN"? Commented Jul 8, 2018 at 0:16
  • For my specific purpose of analysis I would do either a forward fill with the last actual result OR a simple linear interpolation i.e. ( 2, 2, 3) or (2, 2.5, 3). Even further, if original line was (NA, NA, 2, NA, 3) I would want it transformed to : (2, 2, 3, NA, NA) I haven't seen any instance of that in my dataset yet, but great question - as I am sure that instance could arise. Commented Jul 8, 2018 at 11:48

1 Answer 1

2

Use apply with dropna:

df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df1.columns = df.columns
print (df1)
     a    b    c
0  3.0  3.0  NaN
1  4.0  NaN  NaN
2  3.0  5.0  5.0

If performance is important I suggest use this perfect justify function:

arr = justify(df.values, invalid_val=np.nan, axis=1, side='left')
df1 = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df1)
     a    b    c
0  3.0  3.0  NaN
1  4.0  NaN  NaN
2  3.0  5.0  5.0
Sign up to request clarification or add additional context in comments.

3 Comments

Awesome! That worked - just had to do one interim step. Apparently using groupby changes nan's to 0, so just had to do a .replace(0, np.nan) before your solution. Thanks!
On second thought it was probably the .aggregate(np.sum) which converted the nan's
@HowdyDude I think is possible use .sum(min_count=1) instead .aggregate(np.sum), chek this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.