0

EDIT Based on comments, clarifying the examples further to depict more realistic use case

I want to call a function with df.apply. This function returns multiple DataFrames. I want to join each of these DataFrames into logical groups. I am unable to do that without using for loop (which defeats the purpose of calling with apply).

I have tried calling function for each row of dataframe and it is slower than apply. However, with apply combining the results slows down things again.

Any tips?

# input data frame
data = {'Name':['Ani','Bob','Cal','Dom'], 'Age': [15,12,13,14], 'Score': [93,98,95,99]}
df_in=pd.DataFrame(data)
print(df_in)

Output>

  Name  Age  Score
0  Ani   15     93
1  Bob   12     98
2  Cal   13     95
3  Dom   14     99

Function to be applied>

def func1(name, age):
    num_rows = np.random.randint(int(age/3))
    age_mul_1 = np.random.randint(low=1, high=age, size = num_rows)
    age_mul_2 = np.random.randint(low=1, high=age, size = num_rows)
    data = {'Name': [name]*num_rows, 'Age_Mul_1': age_mul_1, 'Age_Mul_2': age_mul_2}
    df_func1 = pd.DataFrame(data)
    return df_func1

def func2(name, age, score, other_params):
    num_rows = np.random.randint(int(score/10))
    score_mul_1 = np.random.randint(low=age, high=score, size = num_rows)
    data2 = {'Name': [name]*num_rows, 'score_Mul_1': score_mul_1}
    df_func2 = pd.DataFrame(data2)
    return df_func2
    
def ret_mul_df(row):
    df_A = func1(row['Name'], row['Age'])
    #print(df_A)
    
    df_B = func2(row['Name'], row['Age'], row['Score'],1)
    #print(df_B)
    return df_A, df_B

What I want to do is essentially create is two dataframes df_A_combined and df_B_combined

However, How I am currently combining is as follows:

df_out = df_in.apply(lambda row: ret_mul_df(row), axis=1)
df_A_combined = pd.DataFrame()
df_B_combined = pd.DataFrame()
for ser in df_out:
    df_A_combined = df_A_combined.append(ser[0], ignore_index=True)
    df_B_combined = df_B_combined.append(ser[1], ignore_index=True)
print(df_A_combined)
Name    Age_Mul_1   Age_Mul_2
0   Ani 7   8
1   Ani 1   4
2   Ani 1   8
3   Ani 12  6
4   Bob 9   8
5   Cal 8   7
6   Cal 8   1
7   Cal 4   8
print(df_B_combined)
Name    score_Mul_1
0   Ani 28
1   Ani 29
2   Ani 50
3   Ani 35
4   Ani 84
5   Ani 24
6   Ani 51
7   Ani 28
8   Bob 32
9   Cal 26
10  Cal 70
11  Dom 56
12  Dom 53

How can I avoid the iteration?

The func1, func2 are calls to 3rd party libraries (which are very computation intensive) and several such calls are made. Also dataframes df_A_combined and df_B_combined are not combinable among themselves

Note: This is a much simplified example and splitting the function will lead to lot of redundancies.

10
  • Can you post what the two final dataframes would look like? It's not clear that you need apply() here. Commented Dec 4, 2020 at 20:24
  • Why do you need multiple dataframes? Commented Dec 4, 2020 at 22:51
  • @JonathanLeon, please see my enhanced example with two final dataframes below. I want to prevent combining dataframes in for loop (as below) as its very heavy Commented Dec 4, 2020 at 22:52
  • You should include that information in the original question. Not as an "answer" Commented Dec 4, 2020 at 22:54
  • @PaulH, I need multiple dataframes as they are sent downstream for different processing. Commented Dec 5, 2020 at 0:34

1 Answer 1

1

If this isn't what you want, I'll update if you can post what the two dataframes should look like.

data = {'Name':['Ani','Bob','Cal','Dom'], 'Age': [15,12,13,14], 'Score': [93,98,95,99]}
df_in=pd.DataFrame(data)
print(df_in)

df_A = df_in[['Name','Age']]
df_A['Age_Multiplier'] = df_A['Age'] * 3
print(df_A)

     ...: print(df_A)
  Name  Age  Age_Multiplier
0  Ani   15              45
1  Bob   12              36
2  Cal   13              39
3  Dom   14              42

df_B = df_in[['Name','Score']]
df_B['Score_Multiplier'] = df_B['Score'] * 2
print(df_B)

     ...: print(df_B)
  Name  Score  Score_Multiplier
0  Ani     93               186
1  Bob     98               196
2  Cal     95               190
3  Dom     99               198
Sign up to request clarification or add additional context in comments.

2 Comments

yes, I need these df_A, df_B. Like I mentioned I made a very simplified example of actual use case. In 'real' use case only shared column is for example 'Name'. The remaining structure looks very different (including multiple rows for same Name in df_B)
If you can post your actual structure (with dummy data if need be), we can help, but it's really not clear what your df outputs are supposed to be.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.