1

I need to replace missing values in a Pandas DataFrame using values from another DataFrame.

df1 = pd.DataFrame({'ID':['1111','2222','3333','4444','5555'],'Test':['T1','T1','T1','T2','T2'], 'Day1': ['P','P','P','P','P'], 'Day2': ['P','P','P','P', 'NaN'], 'Day3':['P','P','NaN','P','NaN'], 'Day4': ['P','P','NaN','P','NaN']})
    ID   Test Day1 Day2 Day3 Day4
0   1111 T1   P    P    P    P
1   2222 T1   P    P    P    P
2   3333 T1   P    P    NaN  NaN
3   4444 T2   P    P    P    P
4   5555 T2   P    NaN  NaN  NaN

df2 = pd.DataFrame({'ID':['3333','5555'], 'Test':['T1','T2'], 'Label': ['OOT-P', 'OOT-T']})
    ID   Test Label
0   3333 T1   OOT-P
1   5555 T2   OOT-T

NaN for Day3 and Day4 for ID=3333 and test=T1 in df1 needs to be replaced with OOT-P from df2.

NaN for Day2, Day3, Day4 for ID=5555 and test=T2 in df1 needs to be replace with OOT-T from df2.

Both dataframes will always have an ID and Test column but the names and number of additional columns in df1 will change. For example, df1 may have Day1, Day2, Day3, Day4, Day5 columns or Week1, Week2, Week3 columns.

2 Answers 2

3

You can reshape df2 as Series to fillna in df1.

A few things are needed: replacing the string 'NaN' with actual float NaN, temporarily setting ID/Test as index, and temporarily transposing the dataframe (fillna with Series/dictionary only works on columns)

(df1.replace('NaN', float('nan'))
    .set_index(['ID','Test']).T
    .fillna(df2.set_index(['ID','Test'])['Label'])
    .T.reset_index()
 )

Output:

     ID Test Day1   Day2   Day3   Day4
0  1111   T1    P      P      P      P
1  2222   T1    P      P      P      P
2  3333   T1    P      P  OOT-P  OOT-P
3  4444   T2    P      P      P      P
4  5555   T2    P  OOT-T  OOT-T  OOT-T
Sign up to request clarification or add additional context in comments.

1 Comment

This is a much more elegant solution
0

You could also make a list of tuples with the IDs that you want to change, and the values that you want to change them to. Then you can simply iterate through the IDs and replace the NaN values with your new values.

IDS_to_Change = df2.ID.values.tolist()

New_Vals = []
for i in range(len(df2.index)):
    New_Vals.append(df2[df2['ID'] == IDS_to_Change[i]].Label[i])

Data_to_Merge = list(zip(IDS_to_Change, New_Vals))

for ids in Data_to_Merge:
    for i in range (len(df1.index)):
        if df1['ID'][i] == ids[0]:
            id_name = ids[0]          
            index = df1[df1['ID'] == id_name].index[0]
            for j in range(len(df1[df1['ID'] == id_name].columns)):
                 if df1[df1['ID'] == id_name].loc[index][j] == 'NaN':
                    df1.iloc[index][j] = ids[1]

Output:

    ID    Test  Day1    Day2    Day3    Day4
0   1111    T1     P       P       P       P
1   2222    T1     P       P       P       P
2   3333    T1     P       P   OOT-P   OOT-P
3   4444    T2     P       P       P       P
4   5555    T2     P   OOT-T   OOT-T   OOT-T

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.