Python Pandas working with dataframes in functions

Question

I have a DataFrame which I want to pass to a function, derive some information from and then return that information. Originally I set up my code like:

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

def test_function(df):

    df['D'] = 0

    df.D = np.random.rand(len(df))

    grouped = df.groupby('A')
    df = grouped.first()
    df = df['D']

    return df


Ds = test_function(df)

print(df)
print(Ds)

Which returns:

    A  B  C         D
0   1  5  1  0.582319
1   1  5  1  0.269779
2   1  6  1  0.421593
3   1  7  1  0.797121
4   2  5  1  0.366410
5   2  6  1  0.486445
6   2  6  1  0.001217
7   3  7  1  0.262586
8   3  7  1  0.146543
9   4  6  1  0.985894
10  4  7  1  0.312070
11  4  7  1  0.498103
A
1    0.582319
2    0.366410
3    0.262586
4    0.985894
Name: D, dtype: float64

My thinking was along the lines of, I don't want to copy my large dataframe, so I will add a working column to it, and then just return the information I want with out affecting the original dataframe. This of course doesn't work, because I didn't copy the dataframe so adding a column is adding a column. Currently I'm doing something like:

add column
results = Derive information
delete column
return results

which feels a bit kludgy to me, but I can't think of a better way to do it without copying the dataframe. Any suggestions?

unutbu · Accepted Answer · 2013-12-31 21:35:00Z

2

If you do not want to add a column to your original DataFrame, you could create an independent Series and apply the groupby method to the Series instead:

def test_function(df):
    ser = pd.Series(np.random.rand(len(df)))
    grouped = ser.groupby(df['A'])
    return grouped.first()

Ds = test_function(df)

yields

A
1    0.017537
2    0.392849
3    0.451406
4    0.234016
dtype: float64

Thus, test_function does not modify df at all. Notice that ser.groupby can be passed a sequence of values (such as df['A']) by which to group instead of the just the name of a column.

answered Dec 31, 2013 at 21:35

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

TristanMatthews Over a year ago

Nice solution, I didn't realize you could use groupby that way.

unutbu Over a year ago

Yes, there is a veritable plethora of objects that can be used to specify a groupby.

Collectives™ on Stack Overflow

Python Pandas working with dataframes in functions

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related