11

Given the following:

df = pd.DataFrame({'col1' : ["a","b"],
            'col2'  : ["ab",np.nan], 'col3' : ["w","e"]})

I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN.

so that I would get something like that for example:

a*ab*w
b*e

Any ideas?

Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).

1
  • I test my answer with this: df = pd.DataFrame({'col1' : ["a","b",3,'ñ'], 'col2' : ["ab",np.nan, 4,'ñ'], 'col3' : ["w","e", 6,'ñ']}) and it still worked Commented May 1, 2015 at 9:23

6 Answers 6

15
In [68]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
df
Out[68]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e

UPDATE

If you have ints or float you can convert these to str first:

In [74]:

df = pd.DataFrame({'col1' : ["a","b",3],
            'col2'  : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
df
Out[74]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
In [76]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[76]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6

Another update

In [81]:

df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
            'col2'  : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
df
Out[81]:
  col1 col2 col3
0    a   ab    w
1    b  NaN    e
2    3    4    6
3    ñ    ü    á

In [82]:

df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
​
df
Out[82]:
  col1 col2 col3 new_col
0    a   ab    w  a*ab*w
1    b  NaN    e     b*e
2    3    4    6   3*4*6
3    ñ    ü    á   ñ*ü*á

My code still works with Spanish characters

Sign up to request clarification or add additional context in comments.

3 Comments

Great seems to work perfectly. One issue, if I have a integers in the mix, python isn't happy. Any idea how to convert things to a string on the fly maybe?
Thanks! Damn I also have annoying characters (letters of spanish so I get the following error): 'ascii' codec can't encode character u'\xf3' in position 6: ordinal not in range(128)
try to solve that last error and if you can't, ask another question, as it is unrelated to your initial question
3
In [1556]: df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
Out[1556]: 
0    a*ab*w
1       b*e
2     3*4*�
3     ñ*ü*á
dtype: object

2 Comments

Oops didn't see this answer was the first. Great I should have thought of using apply. I ended up having a few issues with integers and special character (like spanish letters). The answer bellow solves my issue with integers but waiting for an answer on how to deal with special characters like u'\xf3'.
@Bastien my answer still works for ``'\xf3'` but I'm running python 3
3

using pandas.Series.str.cat function:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(col1=["a","b"],
                       col2=["ab",np.nan], 
                       col3=["w","e"]))
df.T.apply(lambda c: c.str.cat(sep='*'))

will give

0    a*ab*w
1       b*e
dtype: object

if you have mix of int, float and string, you can use:

df.astype(str).T.apply(lambda c: c.replace('nan', np.nan).str.cat(sep='*'))

Comments

2

You can use dropna()

df['col4'] = df.apply(lambda row: '*'.join(row.dropna()), axis=1)

UPDATE:

Since, you need to convert numbers and special chars too, you can use astype(unicode)

In [37]: df = pd.DataFrame({'col1': ["a", "b"], 'col2': ["ab", np.nan], "col3": [3, u'\xf3']})

In [38]: df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)
Out[38]: 
0    a*ab*3
1       b*ó
dtype: object

In [39]: df['col4'] = df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)

In [40]: df
Out[40]: 
  col1 col2 col3    col4
0    a   ab    3  a*ab*3
1    b  NaN    ó     b*ó

Comments

1
df.apply(lambda row: '*'.join(row.dropna()), axis=1)

Comments

1
for row in xrange(len(df)):
    s = '*'.join(df.ix[row].dropna().tolist())
    print s

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.