6

I'm new to pandas and I'm trying to figure this scenario out: I have a sample DataFrame with two products. df =

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-5-18   Fruit Snacks  1.99
          10    1-8-18   Fruit Snacks  1.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-9-18         Apples  1.49
          45    1-10-18        Apples  1.49
          45    1-13-18        Apples  1.49
          45    1-15-18        Apples  2.99 

I also have another small DataFrame that looks like this (which shows promotional prices of the same products): df2=

  Product_Num   Price 
          10    1.99
          45    1.49 

Notice that df2 does not contain columns 'Date' nor 'Description.' What I want to do is to remove all promo prices from df1 (for all dates that are on promo), using the data from df1. What is the best way to do this?

So, I want to see this:

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-15-18        Apples  2.99 

I was thinking of doing a merge on columns Price and Product_Num, then seeing what I can do from there. But I was getting confused because of the multiple dates.

2
  • df[df.Price == 2.99] Commented Jan 30, 2018 at 23:07
  • In my large DataFrame, the prices won't all be 2.99 @thomas.mac Commented Jan 30, 2018 at 23:30

4 Answers 4

9

isin with &

df.loc[~((df.Product_Num.isin(df2['Product_Num']))&(df.Price.isin(df2['Price']))),:]
Out[246]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99

Update

df.loc[~df.index.isin(df.merge(df2.assign(a='key'),how='left').dropna().index)]
Out[260]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99
Sign up to request clarification or add additional context in comments.

5 Comments

won't this also catch (product=10 and price=1.49) ?
i like this solution. can you pls explain what df2.assign(a='key') does?
yes, I have the same question as @jp_data_analysis :)
@jp_data_analysis adding a new key , since , df2 column is subset of df, if we do left merge , it will not change anything :-) , we build a new column for df2 , and do the left merge , then we could filter the unmatched by NAN
@Hana, when columns df=df2 and df2 is subset of df , df.merge(df2,how='left') return the df , only if df and df2 have different in columns , we know which one from df is unmatched with df2 , then we can filter it out
2

With Product_Num as index for both Dataframe, you can drop index from df1 for df2, then concatenate the dataframes :

import pandas as pd

df1 = pd.DataFrame({'Product_Num':[1,2,3,4], 'Date': ['01/01/2012','01/02/2013','02/03/2013','04/02/2013'], 'Price': [10,10,10,10]})
df1 = df1.set_index('Product_Num')
df2 = pd.DataFrame({'Product_Num':[2], 'Date':['03/3/2012'], 'Price': [5]})
df2 = df2.set_index('Product_Num')

Drop and concatenate:

df_new = df1.drop(df2.index)
df_new = pd.concat([df_new, df2])

Result:

               Date  Price
Product_Num                   
1            01/01/2012     10
3            02/03/2013     10
4            04/02/2013     10
2             03/3/2012      5

Comments

1

You could turn df2 into a dictionary and then filter out the values in df1

df[df[df2.columns].isin(df2.to_dict('list')).sum(1) <= 1]

Yeilds

      Date   Description  Price  Product_Num
0    1-1-18  Fruit Snacks   2.99           10
1    1-2-18  Fruit Snacks   2.99           10
4   1-10-18  Fruit Snacks   2.99           10
5    1-1-18        Apples   2.99           45
6    1-3-18        Apples   2.99           45
7    1-5-18        Apples   2.99           45
11  1-15-18        Apples   2.99           45

Comments

0

cute and readable

promo_prices = df2['Price']
promo_prods = df2['Product_Num']

no_pro = df

for price, prod in zip(promo_prices, promo_prods):
    no_pro = no_pro.where(df != (price or prod)).dropna()

1 Comment

except for it's not considered a good practice to use loops with pandas when there are lots of other solutions because it is very slow and memory consuming

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.