0

I have a pandas dataframe "df" on which I apply several functions. I do not want to change the values of the original dataframe. All my functions look like this:

def func(x):
# do some stuff with x
return x

 y = func(x=df)

I do not refer to the df variable within the function. But the variable get changed anyway. Can someone explain to me why that's the case and how to avoid it?

2
  • 1
    Python is pass-by-reference. df is mutable. You're passing a reference to df to the method which is mutating it. If you want to keep the original intact, send in a copy of df Commented Apr 23, 2019 at 8:00
  • 1
    @rdas It is not the case that Python is pass-by-reference. Python uses an evaluation strategy called call by sharing. This is not a well-known name, and it is sometimes called "call by assignment", or if in Java community, "call by value where all values are references". Whatever you want to call it, it is not call-by-reference. The distinguishing feature of call by reference here would be that assignments to a parameter would be seen by the caller, which doesn't happen in Python Commented Apr 23, 2019 at 18:08

1 Answer 1

1

Send a deepcopy of the dataframe

y = func(x=df.copy())

Which by default is a deepcopy.

Sign up to request clarification or add additional context in comments.

3 Comments

This will pass in a shallow copy, which may be enough :)
@Mars not according to the documentation it will. It says deep default is True
Good catch! I forgot that it was a pandas dataframe. Python's copy() is shallow by default. Whoops!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.