1

I have a Dataframe df like the following:

                    Warehouse        Date                Count
0     Delhivery Goa Warehouse     2022-05-12                83
1     Delhivery Goa Warehouse     2022-05-15                 1
2     Delhivery Goa Warehouse     2022-05-18               100
3     Delhivery Tauru Warehouse   2022-05-19               100
4     Delhivery Tauru Warehouse   2022-05-20               100

and another dataframe df_orig like the following:

              index                          Goa    Tauru    
0     2022-05-12Delhivery Goa Warehouse     100.0     0.0   
1     2022-05-15Delhivery Goa Warehouse     100.0     0.0   
2     2022-05-18Delhivery Goa Warehouse     100.0     0.0   
3     2022-05-20Delhivery Tauru Warehouse    0.0     50.0   
4     2022-05-19Delhivery Tauru Warehouse    0.0     70.0   

How can I pick values from the df_orig columns based on combination of warehouse and Date columns of the df?

Expected output:

                    Warehouse        Date                Count      original
0     Delhivery Goa Warehouse     2022-05-12                83       100
1     Delhivery Goa Warehouse     2022-05-15                 1       100
2     Delhivery Goa Warehouse     2022-05-18               100       100
3     Delhivery Tauru Warehouse   2022-05-19               100       70
4     Delhivery Tauru Warehouse   2022-05-20               100       50

My approach:

df['index1'] = str(df['Date']) + str(df['Warehouse'])
original = []
for index, row in df.iterrows():
    if row['index1'] == df_orig['index']:
        original.append(????)
2
  • You could use .merge() and merge on the index1 you create and the index in df_orig Commented Sep 14, 2022 at 7:09
  • @EmiOB I can but I want to pick values from only those column which matches name in the index Commented Sep 14, 2022 at 7:15

2 Answers 2

2

Concatenate Date and Warehouse columns in first dataframe, and calculate the sum of values of all the columns except first using iloc[:,1:], then merge two dataframes, and finally take only the columns of interest:

(df
.assign(index=df['Date'] + df['Warehouse'])
.merge(df_orig.assign(original=df_orig.iloc[:,1:].sum(1)))
)[['Warehouse', 'Date', 'Count', 'original']]

OUTPUT:


                   Warehouse        Date  Count  original
0    Delhivery Goa Warehouse  2022-05-12     83     100.0
1    Delhivery Goa Warehouse  2022-05-15      1     100.0
2    Delhivery Goa Warehouse  2022-05-18    100     100.0
3  Delhivery Tauru Warehouse  2022-05-19    100      70.0
4  Delhivery Tauru Warehouse  2022-05-20    100      50.0
Sign up to request clarification or add additional context in comments.

3 Comments

This works absolutely fine but what if columns in df_orig are dynamic, in future there could be more columns like Goa, Tauru, foo, bar, etc
@RahulSharma look at the updated answer
@RahulSharma Apparently you can also decide which columns you want to exclude df_orig[[c for c in df_orig if c!='index']].sum(1)
1

merge() works here. You can also map the sum of the values in df_orig to df rows using map() method. As index column in df_orig is the same as the concatenation of Date and Warehouse columns in df, first concatenate those columns to make the mapping keys match.

# map the sum of the values in df_orig to df.Warehouse via df_orig.index
df['original'] = (df['Date'].astype(str)+df['Warehouse']).map(df_orig.set_index('index').sum(1))

res

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.