Concatenate cells into a string with separator pandas python

Question

Given the following:

df = pd.DataFrame({'col1' : ["a","b"],
            'col2'  : ["ab",np.nan], 'col3' : ["w","e"]})

I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN.

so that I would get something like that for example:

a*ab*w
b*e

Any ideas?

Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).

I test my answer with this: df = pd.DataFrame({'col1' : ["a","b",3,'ñ'], 'col2' : ["ab",np.nan, 4,'ñ'], 'col3' : ["w","e", 6,'ñ']}) and it still worked — EdChum
– EdChum, Commented May 1, 2015 at 9:23

EdChum · Accepted Answer · 2015-05-01 09:30:13Z

15

In [68]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
df
Out[68]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e

UPDATE

If you have ints or float you can convert these to str first:

In [74]:

df = pd.DataFrame({'col1' : ["a","b",3],
            'col2'  : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
df
Out[74]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
In [76]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[76]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6

Another update

In [81]:

df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
            'col2'  : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
df
Out[81]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
3    ñ    ü    á

In [82]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)

df
Out[82]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6
3    ñ    ü    á   ñ*ü*á

My code still works with Spanish characters

edited May 1, 2015 at 9:30

answered May 1, 2015 at 8:59

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bastien Over a year ago

Great seems to work perfectly. One issue, if I have a integers in the mix, python isn't happy. Any idea how to convert things to a string on the fly maybe?

Bastien Over a year ago

Thanks! Damn I also have annoying characters (letters of spanish so I get the following error): 'ascii' codec can't encode character u'\xf3' in position 6: ordinal not in range(128)

Julien Spronck Over a year ago

try to solve that last error and if you can't, ask another question, as it is unrelated to your initial question

fixxxer · Accepted Answer · 2015-05-01 09:46:46Z

3

In [1556]: df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
Out[1556]: 
0    a*ab*w
1       b*e
2     3*4*�
3     ñ*ü*á
dtype: object

edited May 1, 2015 at 9:46

answered May 1, 2015 at 8:59

fixxxer

16.2k15 gold badges64 silver badges78 bronze badges

2 Comments

Bastien Over a year ago

Oops didn't see this answer was the first. Great I should have thought of using apply. I ended up having a few issues with integers and special character (like spanish letters). The answer bellow solves my issue with integers but waiting for an answer on how to deal with special characters like u'\xf3'.

EdChum Over a year ago

@Bastien my answer still works for ``'\xf3'` but I'm running python 3

lisrael1 · Accepted Answer · 2022-09-07 09:37:55Z

3

using pandas.Series.str.cat function:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(col1=["a","b"],
                       col2=["ab",np.nan], 
                       col3=["w","e"]))
df.T.apply(lambda c: c.str.cat(sep='*'))

will give

0    a*ab*w
1       b*e
dtype: object

if you have mix of int, float and string, you can use:

df.astype(str).T.apply(lambda c: c.replace('nan', np.nan).str.cat(sep='*'))

answered Sep 7, 2022 at 9:37

lisrael1

4093 silver badges9 bronze badges

Comments

Anish Shah · Accepted Answer · 2015-05-01 09:32:03Z

2

You can use dropna()

df['col4'] = df.apply(lambda row: '*'.join(row.dropna()), axis=1)

UPDATE:

Since, you need to convert numbers and special chars too, you can use astype(unicode)

In [37]: df = pd.DataFrame({'col1': ["a", "b"], 'col2': ["ab", np.nan], "col3": [3, u'\xf3']})

In [38]: df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)
Out[38]: 
0    a*ab*3
1       b*ó
dtype: object

In [39]: df['col4'] = df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)

In [40]: df
Out[40]: 
  col1 col2 col3    col4
0    a   ab    3  a*ab*3
1    b  NaN    ó     b*ó

edited May 1, 2015 at 9:32

answered May 1, 2015 at 9:01

Anish Shah

8,2099 gold badges31 silver badges42 bronze badges

Comments

Zah · Accepted Answer · 2015-05-01 08:55:20Z

1

df.apply(lambda row: '*'.join(row.dropna()), axis=1)

answered May 1, 2015 at 8:55

Zah

6,8541 gold badge26 silver badges34 bronze badges

Comments

Julien Spronck · Accepted Answer · 2015-05-01 09:00:52Z

1

for row in xrange(len(df)):
    s = '*'.join(df.ix[row].dropna().tolist())
    print s

answered May 1, 2015 at 9:00

Julien Spronck

15.5k5 gold badges50 silver badges57 bronze badges

Collectives™ on Stack Overflow

Concatenate cells into a string with separator pandas python

6 Answers 6

3 Comments

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related