How to combine multiple rows of strings into one using pandas?

Question

I have a DataFrame with multiple rows. Is there any way in which they can be combined to form one string?

For example:

     words
0    I, will, hereby
1    am, gonna
2    going, far
3    to
4    do
5    this

Expected output:

I, will, hereby, am, gonna, going, far, to, do, this

What is the type of elements? I am guessing 0, 1 , etc is index right? — Anand S Kumar
– Anand S Kumar, Commented Oct 22, 2015 at 11:30

Alex Riley · Accepted Answer · 2015-10-22 11:36:42Z

54

You can use str.cat to join the strings in each row. For a Series or column s, write:

>>> s.str.cat(sep=', ')
'I, will, hereby, am, gonna, going, far, to, do, this'

answered Oct 22, 2015 at 11:36

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

eclairs Over a year ago

i tried the above mentioned code. It give me an error: AttributeError: 'DataFrame' object has no attribute 'str'. Is this because there are blank rows in the dataframe as well? If so, how can i rectify it?

Alex Riley Over a year ago

The .str accessor only works on a Series or a single column of a DataFrame (not an entire DataFrame). If you want to apply this method to multiple columns of a DataFrame, you'll need to use it on each column individually in turn.

eclairs Over a year ago

thanks, could you also help me out with the syntax for the above? If i want to concatenate the rows of column 'words' of dataframe df, how should i write it? Thanks for your help!

Alex Riley Over a year ago

Sure - to apply the method to the 'words' column, you could write df['words'].str.cat(sep=', ') (where df is the name of your DataFrame).

Zero Over a year ago

I'm surprised str.cat is slower to join() method. Do check the solution and timings below.

|

Zero · Accepted Answer · 2016-12-31 11:49:46Z

28

How about traditional python's join? And, it's faster.

In [209]: ', '.join(df.words)
Out[209]: 'I, will, hereby, am, gonna, going, far, to, do, this'

Timings in Dec, 2016 on pandas 0.18.1

In [214]: df.shape
Out[214]: (6, 1)

In [215]: %timeit df.words.str.cat(sep=', ')
10000 loops, best of 3: 72.2 µs per loop

In [216]: %timeit ', '.join(df.words)
100000 loops, best of 3: 14 µs per loop

In [217]: df = pd.concat([df]*10000, ignore_index=True)

In [218]: df.shape
Out[218]: (60000, 1)

In [219]: %timeit df.words.str.cat(sep=', ')
100 loops, best of 3: 5.2 ms per loop

In [220]: %timeit ', '.join(df.words)
100 loops, best of 3: 1.91 ms per loop

answered Dec 31, 2016 at 11:49

Zero

77.4k22 gold badges153 silver badges153 bronze badges

3 Comments

Alex Riley Over a year ago

Interesting timings, I get a similar result on 0.19.2. However, I think the trade-off here is that str.cat will seamlessly handle missing values like NaN and None (you can even supply the na_rep argument to choose how to represent these missing values). Python's join fails here. You can get around this by filtering-out/filling-in missing values and then joining, but this slows it right back down. Filling missing values like this also fails if the column holds categorical values; str.cat just works.

PV8 Over a year ago

How does this works, If i do not want to have the seperators of a coma? What if my outcome should be: I will hereby am gonna going far to do this

Lee He Over a year ago

@PV8 you can try " ".join(...) instead of ", ".join(...)

MarredCheese · Accepted Answer · 2019-04-05 19:06:45Z

18

If you have a DataFrame rather than a Series and you want to concatenate values (I think text values only) from different rows based on another column as a 'group by' key, then you can use the .agg method from the class DataFrameGroupBy. Here is a link to the API manual.

Sample code tested with Pandas v0.18.1:

import pandas as pd

df = pd.DataFrame({
    'category': ['A'] * 3 + ['B'] * 2,
    'name': ['A1', 'A2', 'A3', 'B1', 'B2'],
    'num': range(1, 6)
})

df.groupby('category').agg({
    'name': lambda x: ', '.join(x),
    'num': lambda x: x.max()
})

edited Apr 5, 2019 at 19:06

MarredCheese

21.3k12 gold badges109 silver badges105 bronze badges

answered Sep 20, 2016 at 13:16

Zhong Dai

5244 silver badges10 bronze badges

3 Comments

Rutger Hofste Over a year ago

minor comment: need to assign to a new dataframe i.e.df2 = df.groupby(...)

Edgar Over a year ago

groupby with agg and lambda is quite slow on larger dataframes... is there a way to speed this up?

Firdaus Over a year ago

dude thx for this, it rlly helps me to solve another groupby problem

Andreas · Accepted Answer · 2019-04-22 07:44:33Z

0

For anyone want to know how to combine multiple rows of strings in dataframe,
I provide a method that can concatenate strings within a 'window-like' range of near rows as follows:

# add columns based on 'windows-like' rows
df['windows_key_list'] = pd.Series(df['key'].str.cat([df.groupby(['bycol']).shift(-i)['key'] for i in range(1, windows_size)], sep = ' ')

Note: This can't be reached by groupby, because we don't mean the same id of rows, just near rows.

edited Apr 22, 2019 at 7:44

Andreas

2,52110 gold badges24 silver badges25 bronze badges

answered Apr 22, 2019 at 7:31

Kevin Chou

5875 silver badges8 bronze badges

Collectives™ on Stack Overflow

How to combine multiple rows of strings into one using pandas?

4 Answers 4

8 Comments

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related