2
print(dfs["Categorias"])

I'm getting this:

0                         wordpress, criação de sites
1                                    criação de sites
2             e-commerce, criação de sites, wordpress
3                           marketing digital, vendas

How can I remove repeated items and join the unique values in list?

Thank you

2
  • 3
    And how exactly do you want the result to look? Commented Oct 4, 2021 at 20:52
  • 1
    What have you tried already? Commented Oct 4, 2021 at 21:09

3 Answers 3

1

Are you looking for something like that:

Split each row into a list and explode this list into rows then get unique values of the column.

>>> df['Categorias'].str.split(r',\s+').explode().unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']

Step by step:

>>> df = df['Categorias'].str.split(r',\s+')
0
0                [wordpress, criação de sites]
1                           [criação de sites]
2    [e-commerce, criação de sites, wordpress]
3                  [marketing digital, vendas]
Name: Categorias, dtype: object

>>> df = df.explode()
0
0            wordpress
0     criação de sites
1     criação de sites
2           e-commerce
2     criação de sites
2            wordpress
3    marketing digital
3               vendas
Name: Categorias, dtype: object

>>> df.unique().tolist()
['wordpress', 'criação de sites', 'e-commerce', 'marketing digital', 'vendas']
Sign up to request clarification or add additional context in comments.

2 Comments

Can you elaborate more on what the code does? It looks interesting.
@AshokArora. I updated my answer with some explanation and a step-by-step commands.
1

You could use sets and itertools.chain:

from itertools import chain
set(chain(*df['Categorias'].str.split(',\s+')))

Output:

{'criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress'}

Optionally, as list:

>>> list(set(chain(*df['Categorias'].str.split(',\s+'))))
['criação de sites', 'e-commerce', 'marketing digital', 'vendas', 'wordpress']

Comments

0

One way is to convert the dataframe column to a list, remove duplicates using a set and then join them using string operations.

>>> ', '.join(set(df['Categorias'].str.split(', ').explode().tolist()))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.