Replacing missing values in a Pandas DataFrame based on values in another Pandas DataFrame

Question

I need to replace missing values in a Pandas DataFrame using values from another DataFrame.

df1 = pd.DataFrame({'ID':['1111','2222','3333','4444','5555'],'Test':['T1','T1','T1','T2','T2'], 'Day1': ['P','P','P','P','P'], 'Day2': ['P','P','P','P', 'NaN'], 'Day3':['P','P','NaN','P','NaN'], 'Day4': ['P','P','NaN','P','NaN']})

    ID   Test Day1 Day2 Day3 Day4
0   1111 T1   P    P    P    P
1   2222 T1   P    P    P    P
2   3333 T1   P    P    NaN  NaN
3   4444 T2   P    P    P    P
4   5555 T2   P    NaN  NaN  NaN

df2 = pd.DataFrame({'ID':['3333','5555'], 'Test':['T1','T2'], 'Label': ['OOT-P', 'OOT-T']})

    ID   Test Label
0   3333 T1   OOT-P
1   5555 T2   OOT-T

NaN for Day3 and Day4 for ID=3333 and test=T1 in df1 needs to be replaced with OOT-P from df2.

NaN for Day2, Day3, Day4 for ID=5555 and test=T2 in df1 needs to be replace with OOT-T from df2.

Both dataframes will always have an ID and Test column but the names and number of additional columns in df1 will change. For example, df1 may have Day1, Day2, Day3, Day4, Day5 columns or Week1, Week2, Week3 columns.

mozway · Accepted Answer · 2021-10-30 03:40:49Z

3

You can reshape df2 as Series to fillna in df1.

A few things are needed: replacing the string 'NaN' with actual float NaN, temporarily setting ID/Test as index, and temporarily transposing the dataframe (fillna with Series/dictionary only works on columns)

(df1.replace('NaN', float('nan'))
    .set_index(['ID','Test']).T
    .fillna(df2.set_index(['ID','Test'])['Label'])
    .T.reset_index()
 )

Output:

     ID Test Day1   Day2   Day3   Day4
0  1111   T1    P      P      P      P
1  2222   T1    P      P      P      P
2  3333   T1    P      P  OOT-P  OOT-P
3  4444   T2    P      P      P      P
4  5555   T2    P  OOT-T  OOT-T  OOT-T

answered Oct 30, 2021 at 3:40

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rastawolf Over a year ago

This is a much more elegant solution

rastawolf · Accepted Answer · 2021-10-30 04:47:43Z

You could also make a list of tuples with the IDs that you want to change, and the values that you want to change them to. Then you can simply iterate through the IDs and replace the NaN values with your new values.

IDS_to_Change = df2.ID.values.tolist()

New_Vals = []
for i in range(len(df2.index)):
    New_Vals.append(df2[df2['ID'] == IDS_to_Change[i]].Label[i])

Data_to_Merge = list(zip(IDS_to_Change, New_Vals))

for ids in Data_to_Merge:
    for i in range (len(df1.index)):
        if df1['ID'][i] == ids[0]:
            id_name = ids[0]          
            index = df1[df1['ID'] == id_name].index[0]
            for j in range(len(df1[df1['ID'] == id_name].columns)):
                 if df1[df1['ID'] == id_name].loc[index][j] == 'NaN':
                    df1.iloc[index][j] = ids[1]

Output:

    ID    Test  Day1    Day2    Day3    Day4
0   1111    T1     P       P       P       P
1   2222    T1     P       P       P       P
2   3333    T1     P       P   OOT-P   OOT-P
3   4444    T2     P       P       P       P
4   5555    T2     P   OOT-T   OOT-T   OOT-T

Collectives™ on Stack Overflow

Replacing missing values in a Pandas DataFrame based on values in another Pandas DataFrame

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related