0

I have the following dataframe, groupby objects, and functions.

df = pd.DataFrame({
    'A': 'a a b b b'.split(), 
    'P': 'p p p q q'.split(), 
    'B': [1, 2, 3, 4, 5], 
    'C': [4, 6, 5, 7, 8],
    'D': [9, 10, 11, 12, 13]})

g1 = df.groupby('A')

g2 = df.groupby('P')

def f1(x, y):
    return sum(x) + sum(y)

def f2(x, y):
    return sum(x) - sum(y)

def f3(x, y):
    return x * y

For g1, I want to

  • apply f1 to columns B and C
  • apply f2 to columns C and D.

For g2, I want to

  • apply f2 to columns B and C
  • apply f3 to columns C and D

To me, the difficulty lies in the functions, which operate on multiple columns. I also need the functions to work for any arbitrary set of columns; notice how f2 is used for ['B', 'C'] and ['C', 'D']. I'm struggling with the syntax to deal with this.

How do I use Pandas to do all of these things in Python?

10
  • Does this answer your question? Apply multiple functions to multiple groupby columns Commented Apr 18, 2021 at 17:08
  • 1
    Can you share your expected output ? Commented Apr 18, 2021 at 17:24
  • This is a good example of how to provide useful test data. All too often people do things like "Here's some code that loads a CSV from my hard drive", and there's no way for people trying to answer the question to test their proposed code. Commented Apr 18, 2021 at 18:00
  • @AmitVikramSingh No, it does not. My functions involve operations between any 2 possible columns. That thread uses functions that involve only 1 column at a time. Commented Apr 18, 2021 at 18:47
  • @Iterator516 If you search the seciont Using apply and returning a Series in the answer of @TedPetrou, there he is using multiple columns. Commented Apr 18, 2021 at 18:55

1 Answer 1

1

I don't know if there's a simpler way to do it, but one way is to use currying. I wasn't able to find a way to use the groupby structure to add a column (the structures involved are designed around non-mutable data), so I just dealt with the data in the groupby object directly. You can see whether the following code does what you want:

def sum_curry(x, y):
    return lambda df: sum(df[x]) + sum(df[y])

def diff_curry(x, y):
    return lambda df: sum(df[x]) - sum(df[y])

def append_prod(df):
    df['E'] = df['C']*df['D']
    return df
   
g1_sums = g1.apply(sum_curry('B','C'))
g1_diffs = g1.apply(diff_curry('C','D'))
g2_diffs = g2.apply(diff_curry('B','C'))
g2_with_prod = [(group[0], append_prod(group[1])) for group in g2]
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your detailed reply, but g2_with_prod differs from what I expect. I edited my question to include my expected output above. What is the source of our disagreement?
@Iterator516 In your "desired output" screenshot, you have D as having 30 and 25, but I don't see those numbers in the example data that you're using.
Here is how I arrived at those numbers: For G2, I'm grouping by the column "P", so I'm adding the first 3 numbers for "p" and the last 2 numbers for "q". 9 + 10 + 11 = 30. 12 + 13 = 25.
@Iterator516 In Pandas, groupby creates an object used for aggregation. It is not aggregation itself. If you want it aggregated by sum, you have to tell Pandas that. It sound like aggregated_df = df.groupby('P').sum() and then aggregated_df['E'] = aggregated_df['C']*aggregated_df['D'] gets what you want.
@Accumulation Ah - OK. Thanks for correcting my understanding!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.