0

how to add a column in a data frame based on operations on values in multiple rows of another data frame?

So this is my initial data frame example.

DF

and I want output as below

Output

where

enter image description here

Example

enter image description here

till now I tried to add a new data frame using unique(ord_date,crt_code and del_date combination) and then tried to compute the score for each line but I'm not getting how to put an if the condition.

df2['score'][(df2['ord_date']==xxxx)&(df2['crt_code']==xxxx)&(df2['del_date']==xxxx)] 

= if(df['val1'][(df['slb_qty']==2)&(df['ord_date']==xxxx)&(df['crt_code']==xxxx)&(df['del_date']==xxxx)] + df['val1'][(df['slb_qty']==12)&(df['ord_date']==xxxx)&(df['crt_code']==xxxx)&(df['del_date']==xxxx)] >=80 ) then 200

plus this will become a very large statement to check all 4 conditions which will be hard to read.

Can anyone suggest how to achieve my desired output if possible in a cleaner/simple way?

1 Answer 1

1
  1. you need to collect unique values
  2. sum quantities for each unique value
  3. calculate score for them

Next time post data as text, not as images.

My code with descriptions:

=^..^=

import pandas as pd
from io import StringIO

data = StringIO("""
ord_date crt_code del_date slb_qty val1
01/01/2019 125 10/01/2019 2 38
01/01/2019 125 10/01/2019 4 27
01/01/2019 125 10/01/2019 12 35
01/01/2019 128 10/01/2019 2 45
01/01/2019 128 10/01/2019 4 21
01/01/2019 128 10/01/2019 12 23
01/01/2019 128 10/01/2019 14 24
02/01/2019 125 10/01/2019 2 37
02/01/2019 125 10/01/2019 12 30
02/01/2019 125 10/01/2019 4 29
02/01/2019 128 10/01/2019 14 22
02/01/2019 128 10/01/2019 4 26
02/01/2019 128 10/01/2019 12 21
02/01/2019 128 10/01/2019 2 29
""")

# load data
df = pd.read_csv(data, sep=" ")


# get unique values
df_unique = df.groupby(['ord_date', 'crt_code', 'del_date']).size().reset_index()
# drop last column
df_unique = df_unique.drop([0], axis=1)


# sum quantity values
slb_qty_2_12 = []
slb_qty_4_14 = []
for index, row in df_unique.iterrows():
    # select row range from raw data
    selected_rows = df[(df['ord_date'] == row['ord_date']) & (df['crt_code'] == row['crt_code']) & (df['del_date'] == row['del_date'])]
    # find 2 and 12 qty
    rows_2_12 = selected_rows[(selected_rows['slb_qty'] == 2) | (selected_rows['slb_qty'] == 12)]
    # sum values
    values_sum = rows_2_12['val1'].sum()
    # collect data
    slb_qty_2_12.append(values_sum)
    # find 4 and 14 qty
    rows_4_14 = selected_rows[(selected_rows['slb_qty'] == 4) | (selected_rows['slb_qty'] == 14)]
    # sum values
    values_sum = rows_4_14['val1'].sum()
    # collect data
    slb_qty_4_14.append(values_sum)


# add calculated values to data frame
df_unique['slb_qty_2_12'] = slb_qty_2_12
df_unique['slb_qty_4_14'] = slb_qty_4_14


# calculate score
score = []
for index, row in df_unique.iterrows():
    if row['slb_qty_4_14'] >= 80:
        score.append(300)
    elif 80 > row['slb_qty_4_14'] >= 60:
        score.append(150)
    elif row['slb_qty_2_12'] >= 80:
        score.append(200)
    elif 80 > row['slb_qty_2_12'] >= 60:
        score.append(100)
    else:
        score.append(0)


# drop used columns
df_unique = df_unique.drop(['slb_qty_2_12', 'slb_qty_4_14'], axis=1)
# add score
df_unique['Score'] = score

Output:

     ord_date  crt_code    del_date  Score
0  01/01/2019       125  10/01/2019    100
1  01/01/2019       128  10/01/2019    100
2  02/01/2019       125  10/01/2019    100
3  02/01/2019       128  10/01/2019      0
Sign up to request clarification or add additional context in comments.

1 Comment

It worked great, got some new things to learn.. Thanks a lot.. @Zaraki Kenpachi

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.