1

I have the following dataframe with firstname and surname. I want to create a column fullname.

df1 = pd.DataFrame({'firstname':['jack','john','donald'],
                  'lastname':[pd.np.nan,'obrien','trump']})

print(df1)

  firstname lastname
0      jack      NaN
1      john   obrien
2    donald    trump

This works if there are no NaN values:

df1['fullname'] = df1['firstname']+df1['lastname']

However since there are NaNs in my dataframe, I decided to cast to string first. But it causes a problem in the fullname column:

df1['fullname'] = str(df1['firstname'])+str(df1['lastname'])


  firstname lastname                                           fullname
0      jack      NaN  0      jack\n1      john\n2    donald\nName: f...
1      john   obrien  0      jack\n1      john\n2    donald\nName: f...
2    donald    trump  0      jack\n1      john\n2    donald\nName: f...

I can write some function that checks for nans and inserts the data into the new frame, but before I do that - is there another fast method to combine these strings into one column?

5 Answers 5

3

You need to treat NaNs using .fillna() Here, you can fill it with '' .

df1['fullname'] = df1['firstname'] + ' ' +df1['lastname'].fillna('')

Output:

 firstname  lastname    fullname
0   jack    NaN         jack
1   john    obrien      john obrien
2   donald  trump       donald trumpt
Sign up to request clarification or add additional context in comments.

Comments

1

You may also use .add and specify a fill_value

df1.firstname.add(" ").add(df1.lastname, fill_value="")

PS: Chaining too many adds or + is not recommended for strings, but for one or two columns you should be fine

1 Comment

Agreed and +1, + is not good for chaining. It is workable solution in this case.
0

df1['fullname'] = df1['firstname']+df1['lastname'].fillna('')

1 Comment

might be safe to include the fillna on the firstname column as well but hard to say without the complete data
0

There is also Series.str.cat which can handle NaN and includes the separator.

df1["fullname"] = df1["firstname"].str.cat(df1["lastname"], sep=" ", na_rep="")

   firstname lastname      fullname
 0      jack      NaN         jack
 1      john   obrien   john obrien
 2    donald    trump  donald trump

Comments

0

What I will do (For the case more than two columns need to join)

df1.stack().groupby(level=0).agg(' '.join)
Out[57]: 
0            jack
1     john obrien
2    donald trump
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.