Pandas modify column values based on another DataFrame

Question

I am trying to add values to a column based on a couple of conditions. Here is the code example:

Import pandas as pd

df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})

for index, _ in df1.iterrows():  
    
    if df1.loc[index,'Val'] >=0:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
    else:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])

For each value in the 'Val' column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.

The expected output for this example would be alternate 50 and -50 in df1. The above code does the job, but is too slow to be usable for a large data set. Is there a better way to do this?

Henry Ecker · Accepted Answer · 2021-06-21 17:58:18Z

1

Try adding a Cat column to df1 merge then sum val columns across axis 1 then drop the extra columns:

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)

  Type  Val
0    A   50
1    A   50
2    A  -50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

Add new column with np.where

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')

  Type  Val Cat
0    A   20   p
1    A  -10   n
2    A   20   p
3    A  -10   n
4    B   30   p
5    B  -20   n
6    C   40   p
7    C  -30   n

merge on Type and Cat

df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')

  Type  Val_x Cat  Val_y
0    A     20   p     30
1    A    -10   n    -40
2    A     20   p     30
3    A    -10   n    -40
4    B     30   p     20
5    B    -20   n    -30
6    C     40   p     10
7    C    -30   n    -20

sum Val columns:

df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)

  Type  Val_x Cat  Val_y  Val
0    A     20   p     30   50
1    A    -10   n    -40  -50
2    A     20   p     30   50
3    A    -10   n    -40  -50
4    B     30   p     20   50
5    B    -20   n    -30  -50
6    C     40   p     10   50
7    C    -30   n    -20  -50

drop extra columns:

df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)

  Type  Val
0    A   50
1    A  -50
2    A   50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

edited Jun 21, 2021 at 17:58

answered Jun 21, 2021 at 15:42

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ayush Over a year ago

Thank you. Merging the data frames seems to have re-sequenced it. Is there a way to retain the original sequence? I need to join the data frame with another one down stream. Re-sequencing may be a problem in that case.

Henry Ecker Over a year ago

You could try a left merge. df1 = df1.merge(df2, on=['Type', 'Cat'], how='left') which should maintain the order of df1. I can't replicate a re-sequencing on my end, if you would provide a small amount of sample data that replicates the problem I could look into more detail.

Ayush Over a year ago

The re-sequencing can be seen in this data itself. If the original sequence was maintained, the first four rows of output should have been 50,-50,50,-50, but the output is 50,50,-50,-50. Anyways, I took care of the down stream problem by doing a merge instead of join. This solution worked brilliantly. Thank you.

tomtomfox · Accepted Answer · 2021-06-21 15:41:00Z

1

import numpy as np

df1['sign'] = np.sign(df1.Val)
df2['sign'] = np.sign(df2.Val)
df = pd.merge(df1, df2, on=['Type', 'sign'], suffixes=('_df1', '_df2'))
df['Val'] = df.Val_df1 + df.Val_df2
df = df.drop(columns=['Val_df1', 'sign', 'Val_df2'])
df

answered Jun 21, 2021 at 15:41

tomtomfox

2891 silver badge7 bronze badges

Collectives™ on Stack Overflow

Pandas modify column values based on another DataFrame

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related