How to build a pandas dataframe in a recursive function?

Question

I am trying to implement the 'Bottom-Up Computation' algorithm in data mining (https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf).

I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I am only able to return the final column as output, because I am unable to figure out how to dynamically build a data frame.

Here is the python program:

import pandas as pd

def project_data(df, d):
    return df.iloc[:, d]

def select_data(df, d, val):
    col_name = df.columns[d]
    return df[df[col_name] == val]

def remove_first_dim(df):
    return df.iloc[:, 1:]

def slice_data_dim0(df, v):
    df_temp = select_data(df, 0, v)
    return remove_first_dim(df_temp)

def buc(df):
    dims = df.shape[1]
    if dims == 1:
        input_sum = sum(project_data(df, 0) )
        print(input_sum)
    else:
        dim_vals = set(project_data(df, 0).values)

        for dim_val in dim_vals:
            sub_data = slice_data_dim0(df, dim_val)
            buc(sub_data)
        sub_data = remove_first_dim(df)
        buc(sub_data)


data = {'A':[1,1,1,1,2],
        'B':[1,1,2,3,1],
        'M':[10,20,30,40,50]
        }
    
df = pd.DataFrame(data, columns = ['A','B','M'])
buc(df)

I get the following output:

But what I need is a dataframe, like this (not necessarily formatted, but a data frame):

    A   B   M
0   1   1   30
1   1   2   30
2   1   3   40
3   1   ALL 100
4   2   1   50
5   2   ALL 50
6   ALL 1   80
7   ALL 2   30
8   ALL 3   40
9   ALL ALL 150

How do I achieve this?

Georgina Skibinski · Accepted Answer · 2021-03-14 09:46:17Z

2

Unfortunately pandas doesn't have functionality to do subtotals - so the trick is to just calculate them on the side and concatenate together with original dataframe.

from itertools import combinations
import numpy as np

dim = ['A', 'B']
vals = ['M']

df = pd.concat(
    [df]
# subtotals:
    + [df.groupby(list(gr), as_index=False)[vals].sum() for r in range(len(dim)-1) for gr in combinations(dim, r+1)]
# total:
    + [df.groupby(np.zeros(len(df)))[vals].sum()]
    )\
    .sort_values(dim)\
    .reset_index(drop=True)\
    .fillna("ALL")

Output:

      A    B    M
0     1    1   10
1     1    1   20
2     1    2   30
3     1    3   40
4     1  ALL  100
5     2    1   50
6     2  ALL   50
7   ALL    1   80
8   ALL    2   30
9   ALL    3   40
10  ALL  ALL  150

answered Mar 14, 2021 at 9:46

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

weak_at_math Over a year ago

Thanks a million! Is there a way to do this without using itertools? I am not allowed to import anything other than pandas and numpy (yes, this is a school task. )

Georgina Skibinski Over a year ago

Yes, but you would have to write your own combinations without repetition function. You can start at the original: docs.python.org/3/library/itertools.html#itertools.combinations

weak_at_math Over a year ago

Great, thanks! will check that out. One last thing. The values under columns A and B in the output appear with a .0, for example 1 appears as 1.0, 2 appears as 2.0 etc. This is only for A and B, the output is fine for M. How can I fix this?

weak_at_math Over a year ago

I tried using df.astype(int), but it didn't make a difference.

Georgina Skibinski Over a year ago

Hm, so the problem with this one is, that int in python doesn't have None - hence when you concat and since these are numbers it defaults to float type. Probably the easiest choice would be to map all dim columns to str: for col in dim: df[col] = df[col].map(are) before you concat.

Collectives™ on Stack Overflow

How to build a pandas dataframe in a recursive function?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related