Remove duplicate words in strings in column in every row in data frame

Question

I am trying to remove duplicate words in strings in my data frame per row.

Say my data frame looks like this:

In:
Yes Yes Absolutely
No No Nope   
Win Win Lose



  for row in df.iterrows():
        row["Sentence"] = (list(set(row["Sentence"])))

Desired Out:
Yes Absolutely
No Nope
Win Lose

How can I clean, each row to remove the duplicate strings. I have tried the above code.

Any links to any docs or sources would be greatly appreciated if they can lead me in the right direction. Thank you.

anky · Accepted Answer · 2019-03-10 15:45:42Z

3

You can use (assuming column name is 0):

from collections import OrderedDict
df[0].str.split().apply(lambda x: ','.join(OrderedDict.fromkeys(x).keys()))

0    Yes,Absolutely
1           No,Nope
2          Win,Lose

Note , you can use set as:

df[0].str.split().apply(lambda x: ','.join(list(set(x))))

But set doesn't guarantee the order.

edited Mar 10, 2019 at 15:45

answered Mar 10, 2019 at 15:41

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Learning Developer Over a year ago

It says 7 minutes to accept an answer, I will give it a click when the times up!

Collectives™ on Stack Overflow

Remove duplicate words in strings in column in every row in data frame

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related