0

I have df with three columns a,b,c.I want change NaN values in column b. Eg: For the value of 123 in column a, column b has both abc and NaN. I want both to change to abc.

raw_data = {'a': [123, 123, 456, 456], 
        'b': [np.nan,'abc','def',np.nan],
           'c':[np.nan,np.nan,0,np.nan]}
df = pd.DataFrame(raw_data, columns = ['a', 'b','c'])

    a   b   c
0   123 NaN NaN
1   123 abc NaN
2   456 def 0
3   456 NaN NaN

My expected Output

df


    a   b   c
1   123 abc NaN
0   123 abc NaN
2   456 def 0
3   456 def NaN

What i have tried:

df = df.sort_values(by=['a','b']).fillna(method='ffill')

But this changes the column c also.

Output from above:

a   b   c
1   123 abc NaN
0   123 abc NaN
2   456 def 0
3   456 def 0

How do i use ffill for a particular column or any other approaches recommended?

Sample Data 2:

raw_data = {'a': [123, 123, 456, 456,789,np.nan], 
        'b': [np.nan,'abc','def',np.nan,np.nan,'ghi'],
           'c':[np.nan,np.nan,0,np.nan,np.nan,np.nan]}
df = pd.DataFrame(raw_data, columns = ['a', 'b','c'])

           a    b   c
    0   123.0   NaN NaN
    1   123.0   abc NaN
    2   456.0   def 0
    3   456.0   NaN NaN
    4   789.0   NaN NaN
    5   NaN     ghi abc

Expected Output

           a    b   c
    0   123.0   abc NaN
    1   123.0   abc NaN
    2   456.0   def 0
    3   456.0   def NaN
    4   789.0   NaN NaN
    5   NaN     ghi abc
1
  • 1
    if you want to change only b then do only b: df['b'] = df.sort_values(by=['a','b']).fillna(method='ffill')['b']. Commented Apr 15, 2020 at 19:35

2 Answers 2

1

For your new updated data, you should use Series.map:

df['b'] = df['a'].map(df.groupby('a')['b'].first()).fillna(df['b'])

       a    b    c
0  123.0  abc  NaN
1  123.0  abc  NaN
2  456.0  def  0.0
3  456.0  def  NaN
4  789.0  NaN  NaN
5    NaN  ghi  NaN

Old answer

Use groupby with ffill and bfill:

df['b'] = df.groupby('a')['b'].ffill().bfill()

     a    b    c
0  123  abc  NaN
1  123  abc  NaN
2  456  def  0.0
3  456  def  NaN
Sign up to request clarification or add additional context in comments.

3 Comments

It worked. Can you check the edit. For another sample data.
@Zanthoxylumpiperitum How would you like the output for row 789.0 NaN be?
same as it is - 789.0 NaN NaN. Updated the question
0

The fillna function applies to all the DataFrame. One solution is to truncate your DataFrame in order to have only the columns you want to modify, and apply only to it the fillna function, then add back the other column:

 df_ab = df.loc[:,["a","b"]]
 df_ab = df_ab.sort_values(by=['a','b']).fillna(method='ffill')
 df_ab["c"] = df["c"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.