1

I have a dataframe of customers with some items, which looks like this:

Customer ID     Item
     1         Banana
     1         Apple
     2         Orange
     3         Grape
     4         Banana
     4         Apple
     5         Orange
     5         Grape
     6         Orange

What I'm willing to do is to remove all duplicates customers with same items, so the results should look like this:

Customer ID     Item
     1         Banana
     1         Apple
     2         Orange
     3         Grape
     5         Orange
     5         Grape

As customer 4 has the same items as customer 1. Also customer 6 with 2.

Thanks in advance for your help!

1 Answer 1

3

Not sure if this is what you means. But if you mean duplicates based on the items, you can collect the items for each customer as a frozenset (if unique), or tuple (if not unique), and then apply drop_duplicates; later on do a filter on the original data frame based on the customer ID.

df[df["Customer ID"].isin(df.groupby("Customer ID").Item.apply(frozenset).drop_duplicates().index)]

enter image description here

Or if items are not unique and order doesn't matter:

df[df["Customer ID"].isin(df.groupby("Customer ID")
                            .Item.apply(lambda x: tuple(sorted(x)))
                            .drop_duplicates().index)]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.