1

I have the following data:

    Col1 Col2 Col3 Col4
     A    G     K
     B          L   Q
     C    H     M
     D          N   R
     E    I     O   
     F    J     P

This is what I would like:

    Col1 NEW  Col3 
     A    G     K
     B    Q     L   
     C    H     M
     D    R     N   
     E    I     O   
     F    J     P

I've tried using the following solution and it sort of worked the way I needed. What ended up happening was that I would have dataframe A and then dataframe B. B would contain my new column and A would remain unchanged.

II have attempted to use .replace() and .fillna() but I can't seem to get the syntax right. For some reason when I try to do something like:

      csv[Col2] = csv.replace(to_replace=" ", value=csv[Col4]])

I end up having the entire contents of Col4 overwrite Col2 (blank spaces included). This seems to happen with .replace() and .fillna().

I'm not sure if it helps but in these situations Col2 would only be missing data where Col4 had data. Any ideas?

SOLUTION

Thank you to everyone that replied. I realize now that I have been chasing my tail because of a syntactical error. This was another line of code I tried but couldn't make work. I got the "One column overwrites another" issue (see above).

    csv[Col2] = csv.fillna(csv[Col4]])

But after revisiting that page John Galt linked below I felt like I was missing something..Then it hit me like a train.

    csv[Col2] = csv[Col2].fillna(csv[Col4]])

This worked perfectly. And I feel I like paying closer attention may have resolved this a lot sooner. Thank you all for your patients!

UPDATE ONE

I am appending more information about my dataframe in case it ever helps anyone in the future.

    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 50000 entries, 0 to 49999
    Data columns (total 6 columns):
    Col 1                      50000 non-null object
    Col 2                      4652 non-null object
    Col 3                      50000 non-null object
    Col 4                      45347 non-null object
    Col 5                      50000 non-null object
    Col 6                      50000 non-null object
    dtypes: object(6)
    memory usage: 1.5+ MB
    None
7
  • Are the blanks empty strings or nan? You should be able to do df['NEW'] = df[['Col2', 'Col4']].sum(axis=1) Commented Apr 23, 2015 at 21:07
  • 1
    This stackoverflow.com/q/29832455/2137255 helps? Commented Apr 23, 2015 at 21:09
  • Could you replace all the whitespace with blanks: df['Col2'] = df['Col2'].str.replace(' ','') and do the same for Col4 and then my previous comment should work Commented Apr 23, 2015 at 21:12
  • My apologies for the delay. @EdChum, I believe they are just blanks. If I export the dataframe to a CSV then cells show up empty. If I index it and then export it to CSV I will see the NAN values. With that being said I tried your solution and ended up getting a "cannot concatenate 'str' and 'int' objects." I feel this may be partially my fault as my example only showed strings. I didn't feel the actual data mattered as I was attempting to fill empty cells with another column. Commented Apr 24, 2015 at 13:25
  • Can you post representative data then as this is now an x y problem Commented Apr 24, 2015 at 13:27

1 Answer 1

2

To avoid creating any new temporary variables or dataframes, you could just replace 'Col2' with new values and then drop 'Col4'. There are a few ways to do that.

Numpy where() is a good general solution here that can handle ''(empty or blank), ' '(space), or nan:

df['Col2'] = np.where( df.Col2 == ' ',   df.Col4, df.Col2 )  # space

df['Col2'] = np.where( df.Col2.isnull(), df.Col4, df.Col2 )  # blank or nan 

@EdChum's answer should also work and fillna() may be easiest (as in @JohnGalt's link), but won't work if you have a space character there.

df['Col2'] = df['Col2'].fillna(df.Col4)

Afterwards, just drop 'Col4'

df = df.drop('Col4',axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Marking this as the answer because it consolidates all of the solutions. I fet I owed it to the community to at least test all three solutions. I was able to get the desired result with the Numpy solution as well as .fillna() (after using the right syntax). I feel as though EdChum's solution would work with a different data set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.