1

I have a dataset of orders and people who have placed those orders. Orders have a unique identifier, and buyers have a unique identifier across multiple orders. Here's an example of that dataset:

| Order_ID | Order_Date | Buyer_ID |
|----------|------------|----------|
| 123421   | 01/01/19   | a213422  |
| 123421   | 01/01/19   | a213422  |
| 123421   | 01/01/19   | a213422  |
| 346345   | 01/03/19   | a213422  |
| 567868   | 01/05/19   | a346556  |
| 567868   | 01/05/19   | a346556  |
| 234534   | 01/10/19   | a678909  |

I want to be able to filter the dataset to individuals who have only placed one order, even if that order has multiple items:

| Order_ID | Order_Date | Buyer_ID |
|----------|------------|----------|
| 567868   | 01/05/19   | a346556  |
| 567868   | 01/05/19   | a346556  |
| 234534   | 01/10/19   | a678909  |

If I try df[df['Buyer_ID'].map(df['Buyer_ID'].value_counts()) == 1] I get a really weird situation where the resulting dataframe is only rows where there's a 1 to 1 relationship between Order_ID and Buyer_ID. Like this:

| Order_ID | Order_Date | Buyer_ID |
|----------|------------|----------|
| 346345   | 01/03/19   | a213422  |
| 234534   | 01/10/19   | a678909  |

In the result I want, Buyer_ID a213422 should not appear at all because that person has more than one Order_ID.

This leads me to believe that value_counts() is either not the appropriate way to perform this filter, or I'm doing it wrong. What would be the appropriate way to perform this filter?

3
  • What is the difference between 123421 and 567868? Commented Dec 5, 2019 at 20:56
  • They are just different Order_IDs, with a 1 to many relationship from Order ID to items ordered. They represent a single order. Commented Dec 5, 2019 at 20:57
  • please check my answer Commented Dec 6, 2019 at 13:09

2 Answers 2

4

Method 1: boolean indexing with groupby.transform

df[df.groupby('Buyer_ID')['Order_ID'].transform('nunique').eq(1)]

Method 2: Groupby.filter

df.groupby('Buyer_ID').filter(lambda x: x['Order_ID'].nunique()==1)

Method 3: boolean indexing with Series.map

df[df['Buyer_ID'].map(df.groupby('Buyer_ID')['Order_ID'].nunique().eq(1))]

Output

   Order_ID Order_Date Buyer_ID
4    567868   01/05/19  a346556
5    567868   01/05/19  a346556
6    234534   01/10/19  a678909

If you want to remove duplicates use DataFrame.drop_duplicates at the end:

df[df.groupby('Buyer_ID')['Order_ID'].transform('nunique').eq(1)].drop_duplicates()


   Order_ID Order_Date Buyer_ID
4    567868   01/05/19  a346556
6    234534   01/10/19  a678909
Sign up to request clarification or add additional context in comments.

1 Comment

I added drop_duplicates to complete my solution
0

Here's another way you could do it:

import pandas as pd

# | Order_ID | Order_Date | Buyer_ID |
# |----------|------------|----------|
# | 123421   | 01/01/19   | a213422  |
# | 123421   | 01/01/19   | a213422  |
# | 123421   | 01/01/19   | a213422  |
# | 346345   | 01/03/19   | a213422  |
# | 567868   | 01/05/19   | a346556  |
# | 567868   | 01/05/19   | a346556  |
# | 234534   | 01/10/19   | a678909  |

df = pd.DataFrame.from_dict({
    "Order_ID": [123421, 123421, 123421, 346345, 567868, 567868, 234534],
    "Order_Date": ["01/01/19", "01/01/19", "01/01/19", "01/03/19", "01/05/19", "01/05/19", "01/10/19"],
    "Buyer_ID": ["a213422", "a213422", "a213422", "a213422", "a346556", "a346556", "a678909"],
})

buyers_with_one_order = df.groupby(["Buyer_ID"]) \
                          .agg(num_orders=("Order_ID", pd.Series.nunique)) \
                          .query("num_orders == 1") \
                          .reset_index() \
                          .Buyer_ID

filtered_df = df.merge(buyers_with_one_order).drop_duplicates()

print(filtered_df.to_string(index=False))

# | Order_ID | Order_Date | Buyer_ID |
# |----------|------------|----------|
# | 567868   | 01/05/19   | a346556  |
# | 234534   | 01/10/19   | a678909  |

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.