1

I am trying to remove duplicate words in strings in my data frame per row.

Say my data frame looks like this:

In:
Yes Yes Absolutely
No No Nope   
Win Win Lose



  for row in df.iterrows():
        row["Sentence"] = (list(set(row["Sentence"])))

Desired Out:
Yes Absolutely
No Nope
Win Lose

How can I clean, each row to remove the duplicate strings. I have tried the above code.

Any links to any docs or sources would be greatly appreciated if they can lead me in the right direction. Thank you.

1 Answer 1

3

You can use (assuming column name is 0):

from collections import OrderedDict
df[0].str.split().apply(lambda x: ','.join(OrderedDict.fromkeys(x).keys()))

0    Yes,Absolutely
1           No,Nope
2          Win,Lose

Note , you can use set as:

df[0].str.split().apply(lambda x: ','.join(list(set(x))))

But set doesn't guarantee the order.

Sign up to request clarification or add additional context in comments.

1 Comment

It says 7 minutes to accept an answer, I will give it a click when the times up!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.