-1

I'm following the answer from this question

I have a df like this:

score_1   score_2  
1.11        NaN      
2.22        3.33      
NaN         3.33      
NaN         NaN
........       

The rule for calculating final_score is that we require at least one of the scores to be non-null, if one of the scores in NULL, then final_score will equal to another score (it has all the weights) This is the code to replicate:

import numpy as np
import pandas as pd

df = pd.DataFrame({
            'score_1': [1.11, 2.22, np.nan],
            'score_2': [np.nan, 3.33, 3.33]
        })

def final_score(df):
    if (df['score_1'] != np.nan) and (df['score_2'] != np.nan):
        print('I am condition one')
        return df['score_1'] * 0.2 + df['score_2'] * 0.8

    elif (df['score_1'] == np.nan) and (df['score_2'] != np.nan):
        print('I am the condition two')
        return df['score_2']

    elif (df['score_1'] != np.nan) and (df['score_2'] == np.nan):
        print('I am the condition three')
        return df['score_1']

    elif (df['score_1'] == np.nan) and (df['score_2'] == np.nan):
        print('I am the condition four')
        return np.nan

df['final_score'] = df.apply(final_score, axis=1)
print(df)

This gave me output:

score_1   score_2  final_score
1.11        NaN       NaN
2.22        3.33      3.108
NaN         3.33      NaN
NaN         NaN       NaN
........ 

But my expected output is below:

score_1   score_2  final_score
1.11        NaN       1.11
2.22        3.33      3.108
NaN         3.33      3.33
NaN         NaN       NaN
........ 

The first and third row are not the result I'm expecting, can someone help me, what's wrong with my code? Thanks a lot.

2 Answers 2

3

Lets appy your conditions using np.where

df['final_score'] =np.where(df.notna().all(1),df['score_1'] * 0.2 + df['score_2'] * 0.8,df.mean(1))



   score_1  score_2  final_score
0     1.11      NaN        1.110
1     2.22     3.33        3.108
2      NaN     3.33        3.330
3      NaN      NaN          NaN
Sign up to request clarification or add additional context in comments.

7 Comments

Hi thanks, this is much more simplified, but not very readable for people not familiar with what I'm doing, just wondering why we use df.mean(1) here?
df.mean(1) along the axis 1 that is mean along the rows. np.where(condition, apply if condition is, apply this if not condition is met)
Hi if I still want to use my original code, how can I update it to make it work? Because your one-line code seems not working for my 'large' dataset when there're lots of other columns, so I still want to figure out what's the issue in my original code, thanks.
np.where is vectorized and much faster, unless speed and compute resources are a non issue
df['final_score'] =np.where(df[['score_1','score_2']].notna().all(1),df['score_1'] * 0.2 + df['score_2'] * 0.8,df.mean(1)). Try that and let me know. Typing on phone so didn't test. Basically subsetting the two columns to check if all of them are NaN
|
0

using np.isnan() for comparison should solve the problem

2 Comments

Hi can you be more specific? I tried elif df['score_1'].isnan() and df['score_2'].notnan(): but not working
what i meant was to use np.isnan() in the following way: elif (np.isnan(df['score_1']) and not np.isnan(df['score_2'])):

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.