0

I am trying to add values to a column based on a couple of conditions. Here is the code example:

Import pandas as pd

df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})

for index, _ in df1.iterrows():  
    
    if df1.loc[index,'Val'] >=0:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
    else:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])

For each value in the 'Val' column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.

The expected output for this example would be alternate 50 and -50 in df1. The above code does the job, but is too slow to be usable for a large data set. Is there a better way to do this?

2 Answers 2

1

Try adding a Cat column to df1 merge then sum val columns across axis 1 then drop the extra columns:

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
  Type  Val
0    A   50
1    A   50
2    A  -50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

Add new column with np.where

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
  Type  Val Cat
0    A   20   p
1    A  -10   n
2    A   20   p
3    A  -10   n
4    B   30   p
5    B  -20   n
6    C   40   p
7    C  -30   n

merge on Type and Cat

df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
  Type  Val_x Cat  Val_y
0    A     20   p     30
1    A    -10   n    -40
2    A     20   p     30
3    A    -10   n    -40
4    B     30   p     20
5    B    -20   n    -30
6    C     40   p     10
7    C    -30   n    -20

sum Val columns:

df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
  Type  Val_x Cat  Val_y  Val
0    A     20   p     30   50
1    A    -10   n    -40  -50
2    A     20   p     30   50
3    A    -10   n    -40  -50
4    B     30   p     20   50
5    B    -20   n    -30  -50
6    C     40   p     10   50
7    C    -30   n    -20  -50

drop extra columns:

df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
  Type  Val
0    A   50
1    A  -50
2    A   50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. Merging the data frames seems to have re-sequenced it. Is there a way to retain the original sequence? I need to join the data frame with another one down stream. Re-sequencing may be a problem in that case.
You could try a left merge. df1 = df1.merge(df2, on=['Type', 'Cat'], how='left') which should maintain the order of df1. I can't replicate a re-sequencing on my end, if you would provide a small amount of sample data that replicates the problem I could look into more detail.
The re-sequencing can be seen in this data itself. If the original sequence was maintained, the first four rows of output should have been 50,-50,50,-50, but the output is 50,50,-50,-50. Anyways, I took care of the down stream problem by doing a merge instead of join. This solution worked brilliantly. Thank you.
1
import numpy as np

df1['sign'] = np.sign(df1.Val)
df2['sign'] = np.sign(df2.Val)
df = pd.merge(df1, df2, on=['Type', 'sign'], suffixes=('_df1', '_df2'))
df['Val'] = df.Val_df1 + df.Val_df2
df = df.drop(columns=['Val_df1', 'sign', 'Val_df2'])
df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.