I have quite a difficult issue to explain, I'll try my best. I have a function a() that calls function b() and passes to b() a dataframe (called "df_a"). I learned that this is done by reference, meaning that when/if in inside function b() I add a new column to the input dataframe, this will also modify the original one. For example:
def b(df_b):
df_b['Country'] = "not sure"
def a():
df_a = pd.DataFrame({"Name":['Mark','Annie'], 'Age':[30,28]})
b(df_a)
print(df_a) # this dataframe will now have the column "Country"
So far so good. The problem is that today I realized that if inside b() we merge the dataframe with another dataframe, this create a new local dataframe.
def b(df_b):
df_c = pd.DataFrame({"Name":['Mark','Annie'], 'Country':['Brazil','Japan']})
df_b = pd.merge(df_b, df_c, left_on = 'Name', right_on='Name', how='left')
def a():
df_a = pd.DataFrame({"Name":['Mark','Annie'], 'Age':[30,28]})
b(df_a)
print(df_a) # this dataframe will *not* have the column "Country"
So my question is, how to I make sure in this second example the column "Country" is also assigned to the original df_a dataframe, without returning it back? (I would prefer not to use "return df_b" inside function b() since I would have to change the logic in many many parts of the code. Thank you