0

I have dataframe like this:

item     tags
1        awesome, awesome, great
2        cool, fun
3        boring, boring, average
4        ok, expensive

How can I remove the duplicate tags to get:

item     tags
1        awesome, great
2        cool, fun
3        boring, average
4        ok, expensive
2
  • 1
    a pandas dataframe is not the ideal data structure to deal with this. You should parse this data before inputing it in the data frame Commented Nov 9, 2019 at 20:26
  • I’ll second what @rafaelc said. When you’ve got say, strings, or lists in your DataFrames, it’s often a bad sign. It leads to confusion, which is even visible here: You say that you want to remove “duplicate strings”, then call them “tags”. You aren’t removing duplicate strings, and they clearly aren’t just ordinary text. Commented Nov 9, 2019 at 22:19

2 Answers 2

1

Use listcomp, str.split, pd.unique and join

df['unique_tags'] = [', '.join(pd.unique(x)) for x in df.tags.str.split(', ')]

Out[145]:
   item                     tags      unique_tags
0     1  awesome, awesome, great   awesome, great
1     2                cool, fun        cool, fun
2     3  boring, boring, average  boring, average
3     4            ok, expensive    ok, expensive
Sign up to request clarification or add additional context in comments.

2 Comments

While OP’s question is clearly problematic, this is a nice solution. Would it still work if the list comprehension were replaced with a generator expression?
@AlexanderCécile: direct assignment through dictionary mechanism as in this case or through pandas assign doesn't call next on generator, so it doesn't yield elements in the genex. If using genex, we still need to wrap it inside list constructor to make genex yield out elements.
0

If I understand correctly, try:

df['new_tags'] = df['tags'].apply(lambda x: ', '.join(set(x.split(', '))))

Output:

   item                     tags         new_tags
0     1  awesome, awesome, great   awesome, great
1     2                cool, fun        cool, fun
2     3  boring, boring, average  average, boring
3     4            ok, expensive    expensive, ok

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.