4

I dont understand what is the best practice here:

I want to modify dataframe data in my function. data is defined globally. However, if I specify the global option in the function, I necessarily get an error because data = defines a local variable.

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1, 2, 3, 4]})

def test(data):
    global data
    data =  data + 1
    return data

test(data) 
SyntaxError: name 'data' is local and global

Does that mean I cannot use the global argument when working with dataframes?

def test2(data):
    data =  data + 1
    return data

does not work either. That is the original data is not modified.

What am I missing here?

1
  • You dont need Global variable if you are returning the same object, Just comment that part and run the pgm Commented Feb 8, 2017 at 15:39

1 Answer 1

17

If you want to act on the global data in your function, don't pass it as a parameter:

import pandas as pd

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1,2,3,4]})
def test():
    global data
    data =  data + 1

test()

Another option would be to keep the parameter and assign the result of calling the function:

import pandas as pd

data = pd.DataFrame({'A' : [1, 2, 3, 4],
                     'B' : [1,2,3,4]})

def test(data):
   data =  data + 1
   return data

data = test(data)

You can see that using the same name for both the global and local variables makes things a bit confusing. If you want to go that route, using different names could make it a bit easier on the brain:

import pandas as pd

g_data = pd.DataFrame({'A' : [1, 2, 3, 4],
                       'B' : [1,2,3,4]})

def test(data):
    data =  data + 1
    return data

g_data = test(g_data)
Sign up to request clarification or add additional context in comments.

4 Comments

interesting. but then I dont get why the return is needed here. Isnt the dataframe modified inplace at the line data = data + 1?
Correct, it's not needed.
that actually raises an interesting question. if you do keep the return does the data is duplicated in memory? That could crash your computer when working with large datasets. What do you think?
I don't think there would be any problem with memory in that case because really what gets returned is a reference to the object (essentially a pointer). It's good to think about stuff like that, though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.