0

I have the following function:

def create_col4(df):
    df['col4'] = df['col1'] + df['col2']

If I apply this function within my jupyter notebook as in

create_col4(df_test)

, df_test is persistently amended by col4.

However, if I have the following code where I apply a numpy function:

import numpy as np
def create_col4(df):
    df['col4'] = np.where(df[col1] == 1, True, False)

,

create_col4(df_test) 

does neither persistently append df_test by col4 nor does it throw an error.

Why is this?


The full use case code in case the reason is in the individual code:

working:

def create_leg(df):
    df['leg'] = df["dep_ap_sched"] + "-" + df["arr_ap_sched"]

also working when doing in the jupyter notebook directly:

df['rot_mismatch'] = np.where(
    df['ac_registration_x'].shift(-1).eq(df['ac_registration_x']) == True, 
    ~df['dep_ap_sched'].shift(-1).eq(df['arr_ap_sched']), 
    False 
)

not working:

create_rotmismatch(some_df) where

def create_rotmismatch(df):
    df['rot_mismatch'] = np.where(
        df['ac_registration_x'].shift(-1).eq(df['ac_registration_x']) == True, 
        ~df['dep_ap_sched'].shift(-1).eq(df['arr_ap_sched']), 
        False 
    )

1 Answer 1

1
import numpy as np
def create_col4(df_test):
    df['col4'] = np.where(df[col1] == 1, True, False)

Without inspecting further what I first saw was this. Either df_test or df but you mix names here.

Change it to:

import numpy as np
def create_col4(df):
    df['col4'] = np.where(df[col1] == 1, True, False)

About your other concerns, try to return the df at the end of your function.

def create_rotmismatch(df):
    df['rot_mismatch'] = np.where(
        df['ac_registration_x'].shift(-1).eq(df['ac_registration_x']) == True, 
        ~df['dep_ap_sched'].shift(-1).eq(df['arr_ap_sched']), 
        False 
    )
    return df

df = create_rotmismatch(df)
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for your answer! df/df_test was a typo here, corrected it, thanks. the return statement doesn't help unfortunately, as I then get a "none" type back if I apply the function in the way you mentioned..
Oh thats weird, could you provide some parts of your data to test ? Or did you find the solution ?
Apparently, I didn't restart the kernel earlier (or whatever else I did in the meanwhile), but I retried and now it works using your proposed return solution! (And I feel double-dumb hahaha) Thanks!
classic on off :D i searched a bit for changing df's inside functions. found this answer. at the same time there are functions which do not modify or change the original dataframe, instead they create a new object. This new object will be in the functions scope and you have to return it if you want to get access to it. To be safe you should just return your df at the end.
yea haha :D that was an interesting read, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.