How to join items from same column using pandas in python?

Question

print(dfs["Categorias"])

I'm getting this:

0                         wordpress, criação de sites
1                                    criação de sites
2             e-commerce, criação de sites, wordpress
3                           marketing digital, vendas

How can I remove repeated items and join the unique values in list?

Thank you

And how exactly do you want the result to look?

Jab
– Jab

2021-10-04 20:52:09 +00:00
Commented Oct 4, 2021 at 20:52 — Jab
– Jab, Commented Oct 4, 2021 at 20:52
What have you tried already?

s3dev
– s3dev

2021-10-04 21:09:09 +00:00
Commented Oct 4, 2021 at 21:09 — s3dev
– s3dev, Commented Oct 4, 2021 at 21:09

Corralien · Accepted Answer · 2021-10-04 21:02:15Z

1

Are you looking for something like that:

Split each row into a list and explode this list into rows then get unique values of the column.

>>> df['Categorias'].str.split(r',\s+').explode().unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']

Step by step:

>>> df = df['Categorias'].str.split(r',\s+')
0
0                [wordpress, criação de sites]
1                           [criação de sites]
2    [e-commerce, criação de sites, wordpress]
3                  [marketing digital, vendas]
Name: Categorias, dtype: object

>>> df = df.explode()
0
0            wordpress
0     criação de sites
1     criação de sites
2           e-commerce
2     criação de sites
2            wordpress
3    marketing digital
3               vendas
Name: Categorias, dtype: object

>>> df.unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']

edited Oct 4, 2021 at 21:02

answered Oct 4, 2021 at 20:55

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ashok Arora Over a year ago

Can you elaborate more on what the code does? It looks interesting.

Corralien Over a year ago

@AshokArora. I updated my answer with some explanation and a step-by-step commands.

mozway · Accepted Answer · 2021-10-04 21:02:11Z

1

You could use sets and itertools.chain:

from itertools import chain
set(chain(*df['Categorias'].str.split(',\s+')))

Output:

{'criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress'}

Optionally, as list:

>>> list(set(chain(*df['Categorias'].str.split(',\s+'))))
['criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress']

answered Oct 4, 2021 at 21:02

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Ashok Arora · Accepted Answer · 2021-10-04 21:07:47Z

0

One way is to convert the dataframe column to a list, remove duplicates using a set and then join them using string operations.

>>> ', '.join(set(df['Categorias'].str.split(', ').explode().tolist()))

edited Oct 4, 2021 at 21:07

answered Oct 4, 2021 at 20:57

Ashok Arora

5411 gold badge6 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to join items from same column using pandas in python?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related