0

I would like to get the number of duplicates for article_id for each merchant_id, where the zip_code is not identical. Please see example below:

Table

merchant_id     article_id   zip_code 
1               4555         1000
1               4555         1003
1               4555         1002
1               3029         1000
2               7539         1005
2               7539         1005
2               7539         1002
2               1232         1006
3               5555         1000
3               5555         1001
3               5555         1002
3               5555         1003

Output Table

merchant_id     count_duplicate
1                3
2                2
3                4

So far I was able to return all duplicate rows - see code below:

df[df.duplicated('product_id',keep=False)==True 
0

1 Answer 1

4

We can use groupby with nunique and then filter (query) only the groups > 1. nunique stands for "number of unique values".

dfn = (
    df.groupby(['merchant_id', 'article_id'])['zip_code'].nunique()
    .reset_index(name='count_duplicate')
    .query('count_duplicate > 1')
)

   merchant_id  article_id  count_duplicate
1            1        4555                3
3            2        7539                2
4            3        5555                4
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks it works! Do you have an idea how to count duplicates where the zip_code is identical. I.e. counting for each zip_code and merchant_id the number of product_id duplicates?
Glad it works. Regarding your second question, I see that as a new question, and this one as answered. Feel free to post a new question and I will have look at it! Btw dont forget to accept this one if it helped you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.