4

I have columns in a dataframe (imported from a CSV) containing text like this.

"New york", "Atlanta", "Mumbai"
"Beijing", "Paris", "Budapest"
"Brussels", "Oslo", "Singapore"

I want to collapse/merge all the columns into one single column, like this

New york Atlanta
Beijing Paris Budapest
Brussels Oslo Singapore

How to do it in pandas?

0

5 Answers 5

5

A faster (but uglier) version is with .cat:

df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')

0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
Name: 0, dtype: object

On a larger (10kx5) DataFrame:

%timeit df.apply(" ".join, axis=1)
10 loops, best of 3: 112 ms per loop

%timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
100 loops, best of 3: 4.48 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

3

Suppose you have a DataFrame like so:

>>> df
          0        1          2
0  New york  Atlanta     Mumbai
1   Beijing    Paris   Budapest
2  Brussels     Oslo  Singapore

Then, a simple use of the pd.DataFrame.apply method will work nicely:

>>> df.apply(" ".join, axis=1)
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object

Note, I have to pass axis=1 so that it is applied across the columns, rather than down the rows. I.e:

>>> df.apply(" ".join, axis=0)
0    New york Beijing Brussels
1           Atlanta Paris Oslo
2    Mumbai Budapest Singapore
dtype: object

1 Comment

You can save yourself some characters by df.T.apply(" ".join)
3

Here are a couple more ways:

def pir(df):
    df = df.copy()
    df.insert(2, 's', ' ', 1)
    df.insert(1, 's', ' ', 1)
    return df.sum(1)

def pir2(df):
    df = df.copy()
    return pd.MultiIndex.from_arrays(df.values.T).to_series().str.join(' ').reset_index(drop=True)

def pir3(df):
    a = df.values[:, 0].copy()
    for j in range(1, df.shape[1]):
        a += ' ' + df.values[:, j]
    return pd.Series(a)

Timing

pir3 seems fastest over small df

enter image description here

pir3 still fastest over larger df 30,000 rows

enter image description here

3 Comments

Nitpick: pir3 should be concatenating df.values[ : , 0], df.values[ : , 1] ...
@juanpa.arrivillaga what do you mean. pd.concat? or np.concatenate? Neither of those combine strings. I'd have to use join. Unless I misunderstood what you meant.
I meant the + operator. You fixed it now anyway.
2

If you prefer something more explicit...

Starting with a dataframe df that looks like this:

>>> df
          A         B          C
0  New york   Beijing   Brussels
1   Atlanta     Paris       Oslo
2    Mumbai  Budapest  Singapore

You can create a new column like this:

df['result'] = df['A'] + ' ' + df['B'] + ' ' + df['C']

In this case the result is stored in the 'result' column of the original DataFrame:

          A         B          C                     result
0  New york   Beijing   Brussels  New york Beijing Brussels
1   Atlanta     Paris       Oslo         Atlanta Paris Oslo
2    Mumbai  Budapest  Singapore  Mumbai Budapest Singapore

Comments

1

for the sake of completeness:

In [160]: df1.add([' '] * (df1.columns.size - 1) + ['']).sum(axis=1)
Out[160]:
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object

Explanation:

In [162]: [' '] * (df.columns.size - 1) + ['']
Out[162]: [' ', ' ', '']

Timing against 300K rows DF:

In [68]: df = pd.concat([df] * 10**5, ignore_index=True)

In [69]: df.shape
Out[69]: (300000, 3)

In [76]: %timeit df.apply(" ".join, axis=1)
1 loop, best of 3: 5.8 s per loop

In [77]: %timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
10 loops, best of 3: 138 ms per loop

In [79]: %timeit pir(df)
1 loop, best of 3: 499 ms per loop

In [80]: %timeit pir2(df)
10 loops, best of 3: 174 ms per loop

In [81]: %timeit pir3(df)
10 loops, best of 3: 115 ms per loop

In [159]: %timeit df.add([' '] * (df.columns.size - 1) + ['']).sum(axis=1)
1 loop, best of 3: 478 ms per loop

Conclusion: current winner is @piRSquared's pir3()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.