0

I have two Dataframes, and would like to create a new column in DataFrame 1 based on DataFrame 2 values.

But I dont want to join the two dataframes per say and make one big dataframe, but rather use the second Dataframe simply as a look-up.

#Main Dataframe:
df1 = pd.DataFrame({'Size':["Big", "Medium", "Small"], 'Sold_Quantity':[10, 6, 40]})

#Lookup Dataframe
df2 = pd.DataFrame({'Size':["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean':[10, 20, 30]})

#Create column in Dataframe 1 based on lookup dataframe values:
df1['New_Column'] = when df1['Size'] = df2['Size'] and df1['Sold_Quantity'] < df2['Sold_Quantiy_Score_Mean'] then 'Below Average Sales' else 'Above Average Sales!' end

2 Answers 2

2

One approach, is to use np.where:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantity': [10, 6, 40]})
df2 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean': [10, 20, 30]})

condition = (df1['Size'] == df2['Size']) & (df1['Sold_Quantity'] < df2['Sold_Quantiy_Score_Mean'])

df1['New_Column'] = np.where(condition, 'Below Average Sales', 'Above Average Sales!')

print(df1)

Output

     Size  Sold_Quantity            New_Column
0     Big             10  Above Average Sales!
1  Medium              6   Below Average Sales
2   Small             40  Above Average Sales!
Sign up to request clarification or add additional context in comments.

Comments

1

Given that df2 is sort of like a lookup based on Size, it would make sense if your Size column was its index:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantity': [10, 6, 40]})
df2 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean': [10, 20, 30]})

lookup = df2.set_index("Size")

You can then map the Sizes in df1 to their mean and compare each with the sold quantity:

is_below_mean = df1["Sold_Quantity"] < df1["Size"].map(lookup["Sold_Quantiy_Score_Mean"])

and finally map the boolean values to the respective strings using np.where

df1["New_Column"] = np.where(is_below_mean, 'Below Average Sales', 'Above Average Sales!')

df1:

     Size  Sold_Quantity            New_Column
0     Big             10  Above Average Sales!
1  Medium              6   Below Average Sales
2   Small             40  Above Average Sales!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.