3

I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

def test_function(df):

    df['D'] = 0

    df.D = np.random.rand(len(df))

    grouped = df.groupby('A')
    df = grouped.first()
    df = df['D']

    return df


Ds = test_function(df)

print(df)
print(Ds)

Which returns:

    A  B  C         D
0   1  5  1  0.582319
1   1  5  1  0.269779
2   1  6  1  0.421593
3   1  7  1  0.797121
4   2  5  1  0.366410
5   2  6  1  0.486445
6   2  6  1  0.001217
7   3  7  1  0.262586
8   3  7  1  0.146543
9   4  6  1  0.985894
10  4  7  1  0.312070
11  4  7  1  0.498103
A
1    0.582319
2    0.366410
3    0.262586
4    0.985894
Name: D, dtype: float64

My thinking was along the lines of, I don't want to copy my large dataframe, so I will add a working column to it, and then just return the information I want with out affecting the original dataframe. This of course doesn't work, because I didn't copy the dataframe so adding a column is adding a column. Currently I'm doing something like:

add column
results = Derive information
delete column
return results

which feels a bit kludgy to me, but I can't think of a better way to do it without copying the dataframe. Any suggestions?

0

1 Answer 1

2

If you do not want to add a column to your original DataFrame, you could create an independent Series and apply the groupby method to the Series instead:

def test_function(df):
    ser = pd.Series(np.random.rand(len(df)))
    grouped = ser.groupby(df['A'])
    return grouped.first()

Ds = test_function(df)

yields

A
1    0.017537
2    0.392849
3    0.451406
4    0.234016
dtype: float64

Thus, test_function does not modify df at all. Notice that ser.groupby can be passed a sequence of values (such as df['A']) by which to group instead of the just the name of a column.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice solution, I didn't realize you could use groupby that way.
Yes, there is a veritable plethora of objects that can be used to specify a groupby.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.