1

I have two pandas data frames like below. The column 'No' is a common field. Based on 'No', i want to replace values in first data frame column 'Total'.

Condition is : Where ever 'No' matches, get 'Marks1' value from dataframe2 and replace in 'Total' column. If 'Marks1' is NULL, then get 'Marks2' value and replace in 'Total'. If both (Marks1/Marks2) are null, replace with null in the 'Total' column. The final result should be in data frame1. Both data frames has few hundred thousand records.

Data frame1
No|Total
1234|11
2515|21
3412|32
4854|
7732|53

Data frame2
No|Marks1|Marks2
1234|99|23
2515|98|31
3412||20
4854||98
7732||

Result :
No|Total
1234|99
2515|98
3412|20
4854|98
7732|

2 Answers 2

2

Use Series.map with replace missing values Marks1 by Marks2 with Series.fillna:

df = df2.set_index('No')

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

If possible duplicated values in No for df2 then use:

print (df2)
     No  Marks1  Marks2
0  1234    99.0    23.0 <- duplicated No
1  1234    98.0    31.0 <- duplicated No
2  3412     NaN    20.0
3  4854     NaN    98.0
4  7732     NaN     NaN

#newer pandas versions
df = df2.set_index('No').sum(level=0, min_count=1)
#oldier pandas versions
#df = df2.set_index('No').sum(level=0)
print (df)
      Marks1  Marks2
No                  
1234   197.0    54.0<- unique No, values are summed per index created by No
3412     NaN    20.0
4854     NaN    98.0
7732     NaN     NaN

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234  197.0
1  2515    NaN
2  3412   20.0
3  4854   98.0
4  7732    NaN

If there is same index values in df1 and df2 and each No values matched use:

df1['Total'] = df2['Marks1'].fillna(df2['Marks2'])
Sign up to request clarification or add additional context in comments.

1 Comment

Comments are not for extended discussion; this conversation has been moved to chat.
1

You can use np.select here.

m = df2['Marks1'].notna()
m1 = df2['Marks1'].isna() & df2['Marks2'].notna()
condlist = [m,m1]
choice = [df2['Marks1'] , df2['Marks2']]
df1['Total'] = np.select(condlist,choice,np.nan)

     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

4 Comments

I think you forgot to include condlist and chioce
Hi, i'm getting the error : AttributeError: 'Series' object has no attribute 'notna'
@RK. You need to upgrade your pandas version. .notna, isna, notnull all have been added from pandas 0.21.0
i upgraded to pandas 0.22 and trying. When np.select encounters throwing below error : raise ValueError('Length of values does not match length of ' 'index') ValueError: Length of values does not match length of index

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.