I have this dataset,
PRODUCT_ID SALE_DATE SALE_PRICE PROVIDER
1 01/02/16 25 1
1 02/10/16 60 1
1 01/11/16 63 2
1 09/10/16 65 3
2 11/11/15 54 1
2 13/01/16 34 2
3 19/05/14 45 1
3 15/10/15 38 1
3 16/06/14 53 2
3 18/10/15 58 2
This is a combined dataset, data provided by different providers, there is no common identifier for each sale. The issue here is each data provider will have slightly different date and price for each sale. So I am trying to group them together as single sale with a group id. So the business logic here is data provider 1 is the first to get sale data, so for a product id, if the sale date from provider 2 or 3 is within 1 month time and the price is within 10$ difference (more or less), we consider them as same sale, else it will be considered as different sale. So, the output should look like,
PRODUCT_ID SALE_DATE SALE_PRICE PROVIDER SALE_GROUP_ID
1 01/02/16 25 1 1
1 02/10/16 60 1 2
1 01/11/16 63 2 2
1 09/10/16 65 3 2
2 11/11/15 54 1 3
2 13/01/16 34 2 4
3 19/05/14 45 1 5
3 15/10/15 38 1 6
3 16/06/14 53 2 5
3 18/10/15 58 2 7
How do I achieve this in pandas, can someone help please? Thanks.
18/10/15and11/11/15are within one month and have $10 in price difference. Are they same sale?19/05/14and16/06/14have same PRODUCT_ID, are within one month, have $10 in price difference, and are same sale. Do their sale IDs need to be in sequence?