Pandas data frame compare and replace values

Question

I have two pandas data frames like below. The column 'No' is a common field. Based on 'No', i want to replace values in first data frame column 'Total'.

Condition is : Where ever 'No' matches, get 'Marks1' value from dataframe2 and replace in 'Total' column. If 'Marks1' is NULL, then get 'Marks2' value and replace in 'Total'. If both (Marks1/Marks2) are null, replace with null in the 'Total' column. The final result should be in data frame1. Both data frames has few hundred thousand records.

Data frame1
No|Total
1234|11
2515|21
3412|32
4854|
7732|53

Data frame2
No|Marks1|Marks2
1234|99|23
2515|98|31
3412||20
4854||98
7732||

Result :
No|Total
1234|99
2515|98
3412|20
4854|98
7732|

jezrael · Accepted Answer · 2020-06-09 13:30:22Z

2

Use Series.map with replace missing values Marks1 by Marks2 with Series.fillna:

df = df2.set_index('No')

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

If possible duplicated values in No for df2 then use:

print (df2)
     No  Marks1  Marks2
0  1234    99.0    23.0 <- duplicated No
1  1234    98.0    31.0 <- duplicated No
2  3412     NaN    20.0
3  4854     NaN    98.0
4  7732     NaN     NaN

#newer pandas versions
df = df2.set_index('No').sum(level=0, min_count=1)
#oldier pandas versions
#df = df2.set_index('No').sum(level=0)
print (df)
      Marks1  Marks2
No                  
1234   197.0    54.0<- unique No, values are summed per index created by No
3412     NaN    20.0
4854     NaN    98.0
7732     NaN     NaN

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234  197.0
1  2515    NaN
2  3412   20.0
3  4854   98.0
4  7732    NaN

If there is same index values in df1 and df2 and each No values matched use:

df1['Total'] = df2['Marks1'].fillna(df2['Marks2'])

edited Jun 9, 2020 at 13:30

answered Jun 9, 2020 at 11:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Samuel Liew Over a year ago

Comments are not for extended discussion; this conversation has been moved to chat.

Ch3steR · Accepted Answer · 2020-06-09 12:07:45Z

1

You can use np.select here.

m = df2['Marks1'].notna()
m1 = df2['Marks1'].isna() & df2['Marks2'].notna()
condlist = [m,m1]
choice = [df2['Marks1'] , df2['Marks2']]
df1['Total'] = np.select(condlist,choice,np.nan)

     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

edited Jun 9, 2020 at 12:07

answered Jun 9, 2020 at 12:03

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

4 Comments

DavideBrex Over a year ago

I think you forgot to include condlist and chioce

RK. Over a year ago

Hi, i'm getting the error : AttributeError: 'Series' object has no attribute 'notna'

Ch3steR Over a year ago

@RK. You need to upgrade your pandas version. .notna, isna, notnull all have been added from pandas 0.21.0

RK. Over a year ago

i upgraded to pandas 0.22 and trying. When np.select encounters throwing below error : raise ValueError('Length of values does not match length of ' 'index') ValueError: Length of values does not match length of index

Collectives™ on Stack Overflow

Pandas data frame compare and replace values

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related