1

I have a function that returns a dict object and I would like to take advantage of pandas/numpy's ability to perform columnwise operations/vectorization for this function across every row of a dataframe. The inputs for the function are specified in the dataframe and I want the outputs of the function to become new columns on the existing dataframe. Below is an example.

def func(a, b, c):
    return {
        "a_calc": a * 2, 
        "b_calc": b * 3, 
        "c_calc": c * 4
    }
df = pd.DataFrame([{"a":1, "b":2, "c": 3}, {"a": 4, "b": 5, "c": 6}])
   a  b  c
0  1  2  3
1  4  5  6

Desired Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

I was reading this answer and it got most of the way there but I couldn't quite figure out how to do it for when the function returns a dict object with the desired column names as the keys within the dict.

2 Answers 2

4

Let's use some dataframe unpacking:

df.join(pd.DataFrame(func(**df)))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Or really getting cute:

df.assign(**func(**df))
Sign up to request clarification or add additional context in comments.

Comments

2

If you cannot modify your function, you can do:

df.join(pd.DataFrame(func(df['a'], df['b'],df['c']), index=df.index))

Output:

   a  b  c  a_calc  b_calc  c_calc
0  1  2  3       2       6      12
1  4  5  6       8      15      24

Note We exploit the fact that func can accept series input and works in parallel. In the general case, you need a for loop:

pd.DataFrame([func(x['a'], x['b'], x['c']) for _, x in df.iterrows()],
              index=df.index)

6 Comments

In my case, func() cannot accept series data. Is there any other way besides using .iterrows()?
@AustinUlfers Can you rewrite your function to accept a pd.Series?
@ScottBoston Its a relatively long function so I would like to avoid having to rewrite it if possible.
@AustinUlfers I think the only other way is to use lambda with axis=1 which is woefully slow and inefficient.
@AustinUlfers iterrows, apply, vanilla for loop are essentially equivalent. You could avoid one or another, but in general you would need to use one.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.