0

Below I am creating 3 dataframes. df2 and df3 are both nested dataframes of df1. I am then trying to use .apply() on all the nested dataframes, and ultimately add a new column to the outer dataframe that is essentially a revised version of the nested dataframes.

I would like to apply the function below to all of the elements (dataframes) that could be found in the 'df_name' column of df1. I also need to pass other column values from df1 into the .apply() function that are on the same row - ie. the value 'sp' needs to be known when running on the .apply() function to df2.

In the attempt below, I would grateful for some insight on: -how to access the nested dataframes with the .apply() function and refer to values from the same row/different column of df1. -is there a way to approach this using vectorization?

import pandas as pd

cols = ['sales', 'sku']
names = [
    [100, 'asdf'],
    [200, 'qwer'],
    [250, 'zxcv'],
    [175, 'yuop']
]
df2 = pd.DataFrame(names, columns = cols)


cols = ['sales', 'sku']
names = [
    [80, 'nyer'],
    [60, 'cawe']
]
df3 = pd.DataFrame(names, columns = cols)


cols = ['name', 'cmpgn_type', 'df_name']
names = [
    ['dustin', 'sp', df2],
    ['jenny', 'sb', df3]
]
df1 = pd.DataFrame(names, columns = cols)


sp_cols_order = ['sales', 'sku', 'Record Type']
sb_cols_order = ['Record_Type', 'sku', 'sales']


def cmpngs(df, type):
    df_shape = df.shape[0]
    for x in range(df_shape):
        df['Record_Type'] = 'hello'
        if type == 'sp':
            df = df[sp_cols_order]
        elif type == 'sb':
            df = df[sb_cols_order]
    return df


df1['ul_cmpgn'] = df1['df_name'].apply(cmpngs, args=(df1['cmpgn_type'],))

print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])

expected output for df1:

     name cmpgn_type df_name ul_cmpgn
0  dustin         sp     df2     df2a
1   jenny         sb     df3     df3a

expected output for df2:

   sales   sku Record_Type
0    100  asdf       hello
1    200  qwer       hello
2    250  zxcv       hello
3    175  yuop       hello

expected output for df3:

  Record Type  sales   sku
0       hello     80  nyer
1       hello     60  cawe
4
  • Is your code working or broken? Commented Mar 15, 2022 at 0:03
  • it is currently broken - I am getting a 'The truth value of a Series is ambiguous.' error Commented Mar 15, 2022 at 0:04
  • I think I've fixed that, but first will you please add your expected output of the two print statements to the question? Commented Mar 15, 2022 at 0:06
  • 1
    this is now revises to show the expected outcome @richardec thank you Commented Mar 15, 2022 at 0:28

1 Answer 1

1

Try changing your cmpngs function to take a single parameter - row, and call apply on the whole dataframe instead of just the df_name column, and with axis=1:

def cmpngs(row):
    df = row['df_name']
    type = row['cmpgn_type']
    df_shape = df.shape[0]
    for x in range(df_shape):
        df['Record Type'] = 'hello'
        if type == 'sp':
            df = df[sp_cols_order]
        elif type == 'sb':
            df = df[sb_cols_order]
    return df

df1['ul_cmpgn'] = df1.apply(cmpngs, axis=1)

print(df1['ul_cmpgn'].iloc[0])
print(df1['ul_cmpgn'].iloc[1])

Output:

   sales   sku
0    100  asdf
1    200  qwer
2    250  zxcv
3    175  yuop

    sku  sales
0  nyer     80
1  cawe     60

You can't really vectorize operations with nested dataframes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.