Python dataframe assignment

Question

I have quite a difficult issue to explain, I'll try my best. I have a function a() that calls function b() and passes to b() a dataframe (called "df_a"). I learned that this is done by reference, meaning that when/if in inside function b() I add a new column to the input dataframe, this will also modify the original one. For example:

def b(df_b):
   df_b['Country'] = "not sure"

def a():
   df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
   b(df_a)
   print(df_a) # this dataframe will now have the column "Country"

So far so good. The problem is that today I realized that if inside b() we merge the dataframe with another dataframe, this create a new local dataframe.

def b(df_b):
       df_c = pd.DataFrame({"Name":['Mark','Annie'],  'Country':['Brazil','Japan']})
       df_b = pd.merge(df_b, df_c, left_on = 'Name', right_on='Name', how='left')
def a():
       df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
       b(df_a)
       print(df_a) # this dataframe will *not* have the column "Country"

So my question is, how to I make sure in this second example the column "Country" is also assigned to the original df_a dataframe, without returning it back? (I would prefer not to use "return df_b" inside function b() since I would have to change the logic in many many parts of the code. Thank you

In your last sentence, Did you mean " I would not prefer to use"? — DavideBrex
– DavideBrex, Commented Jun 18, 2020 at 18:51

DavideBrex · Accepted Answer · 2020-06-18 18:39:47Z

1

I have modified the function b() and a() so the changes made in b are returned back to a

def b(df_b):
    df_c = pd.DataFrame({"Name":['Mark','Annie'],  'Country':['Brazil','Japan']})
    df_b = pd.merge(df_b, df_c, left_on = 'Name', right_on='Name', how='left')
    return df_b
def a():
    df_a = pd.DataFrame({"Name":['Mark','Annie'],  'Age':[30,28]})
    df_a = b(df_a)
    print(df_a)

**Output:** a()

    Name  Age Country
0   Mark   30  Brazil
1  Annie   28   Japan

edited Jun 18, 2020 at 18:39

DavideBrex

2,4241 gold badge14 silver badges25 bronze badges

answered Jun 18, 2020 at 18:36

Suraj

2,4874 gold badges28 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Angelo Over a year ago

thanks Suraj but you have used the return function and as mentioned above that's currently not an option on my end since I trusted every change applied to df_b would have also be reflected in df_a

Suraj Over a year ago

I think the problem is that when you merge it, your not doing the operation inplace, hence a new dataframe is created, which points to a different location in memory. So df_b and df_a have different values. Also pandas.merge does not have inplace paramter.

Angelo Over a year ago

right, so how can I do something similar to the inplace parameter without creating a new dataframe that is stored in a new location in the memory?

Suraj Over a year ago

I am unaware of such a technique man, I'll let you know if i find out.

Collectives™ on Stack Overflow

Python dataframe assignment

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related