3

I have quite a difficult issue to explain, I'll try my best. I have a function a() that calls function b() and passes to b() a dataframe (called "df_a"). I learned that this is done by reference, meaning that when/if in inside function b() I add a new column to the input dataframe, this will also modify the original one. For example:

def b(df_b):
   df_b['Country'] = "not sure"

def a():
   df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
   b(df_a)
   print(df_a) # this dataframe will now have the column "Country"

So far so good. The problem is that today I realized that if inside b() we merge the dataframe with another dataframe, this create a new local dataframe.

def b(df_b):
       df_c = pd.DataFrame({"Name":['Mark','Annie'],  'Country':['Brazil','Japan']})
       df_b = pd.merge(df_b, df_c, left_on = 'Name', right_on='Name', how='left')
def a():
       df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
       b(df_a)
       print(df_a) # this dataframe will *not* have the column "Country"

So my question is, how to I make sure in this second example the column "Country" is also assigned to the original df_a dataframe, without returning it back? (I would prefer not to use "return df_b" inside function b() since I would have to change the logic in many many parts of the code. Thank you

2
  • In your last sentence, Did you mean " I would not prefer to use"? Commented Jun 18, 2020 at 18:51
  • yes, you're actually right Commented Jun 18, 2020 at 18:57

1 Answer 1

1

I have modified the function b() and a() so the changes made in b are returned back to a

def b(df_b):
    df_c = pd.DataFrame({"Name":['Mark','Annie'],  'Country':['Brazil','Japan']})
    df_b = pd.merge(df_b, df_c, left_on = 'Name', right_on='Name', how='left')
    return df_b
def a():
    df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
    df_a = b(df_a)
    print(df_a) 

**Output:** a()

    Name  Age Country
0   Mark   30  Brazil
1  Annie   28   Japan
Sign up to request clarification or add additional context in comments.

4 Comments

thanks Suraj but you have used the return function and as mentioned above that's currently not an option on my end since I trusted every change applied to df_b would have also be reflected in df_a
I think the problem is that when you merge it, your not doing the operation inplace, hence a new dataframe is created, which points to a different location in memory. So df_b and df_a have different values. Also pandas.merge does not have inplace paramter.
right, so how can I do something similar to the inplace parameter without creating a new dataframe that is stored in a new location in the memory?
I am unaware of such a technique man, I'll let you know if i find out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.