Create 3 dataframes from original dataframe using existing values

Question

I have the following dataframe which I'm wanting to create 3 new dataframes from using the values in specific columns (ppbeid, initpen and incpen) and using the unique entries in the benid and id columns:

In part of my code, below, I'm using a list of unique items in the benid column, then removing any blanks from the list. This will give me some of the column headers I want in the new dataframes, but I also want the unique ids in the id column, the aforementioned list of unique benid items and, for a couple of the new dataframes, a total column too (see last 3 Excel screenshots):

lst_benids = df_ppbens.benid.unique()
lst_benids = list(filter(None, lst_benids))

# Result is: ['PENSION', 'POST', 'PRE8', 'SPOUSE', 'RULE29', 'MOD']

I know how to achieve this in Excel using Index/Match/Match, but it's long-winded and I really want to learn how to do this in Pandas. The output should be the following 3 dataframes (which I'll then export to Excel in different worksheets):

First dataframe should be what the ppbeid column entry is, for the corresponding benid field, listed by the unique ids:

The second dataframe should be the initpen figures for those unique ids are and the specific corresponding benid, with a total column at the end:

The third and final dataframe is the same as above but instead it's got the incpen column figures for corresponding benids and a total column at the end:

Any help is much appreciated and it will help me learn something I have to do manually in Excel a lot. Being new to Pandas/Python, I'm finding it confusing navigating the documents and other resources online. Thanks

Sorry, I'm trying (unsuccessfully) to figure out how to put the first dataframe as code. — R41nMak3R
– R41nMak3R, Commented Sep 12, 2022 at 20:15
Thank you Josh, I've got so much to learn about Pandas and StackOverflow (next time, I'll work out how to recreate the original dataframe - my sincerest apologies!) I really appreciate you taking the time to help. Now, I can't wait to try this solution at work in the morning (UK time!) :) — R41nMak3R
– R41nMak3R, Commented Sep 12, 2022 at 20:22

Josh Friedlander · Accepted Answer · 2022-09-12 20:14:06Z

1

It seems like what you want can be done with the pivot method, which is similar to Excel's pivot table.

First let's set up the data:

df = pd.DataFrame(
    {
        "id": [92, 92, 133, 133, 133, 705, 705, 705, 588, 588],
        "initpen": [0] * 8 + [606.32, 1559.39],
        "incpen": [963.18, 462, 886.08, 529.32, 609.6, 0, 0, 0, 624.52, 1635.8],
        "benid": ["PENSION", "POST", "PRE8", "PENSION", "POST", "POST", "PRE8", "PENSION", "POST", "PENSION",],
        # I got tired of typing out the whole numbers...
        "ppbeid": [6197, 6197, 61990, 61998, 61990, 828, 828, 828, 8289, 8289],
    }
)

Then you can simply do:

df1 = df.pivot(index='id', columns='benid', values='ppbeid')

And for the others, substitute the appropriate variable name for ppbeid.

The to add the total just do:

df1['Total'] = df1.sum(1)

answered Sep 12, 2022 at 20:14

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

R41nMak3R Over a year ago

Hi Josh, it won't let me pivot as there are duplicates in the id column. Would it work if I used the default index as the index?

Josh Friedlander Over a year ago

I think you'd get a different result. But in my example there are also duplicates in the ID column, and it wasn't a problem. What exactly is the error message you get?

R41nMak3R Over a year ago

Thanks to your solution (keeping it as the answer), I found an alternate way to pivot my dataframe, using pd.pivot_table and its parameters :)

Josh Friedlander Over a year ago

didn't know about that one. glad you figured it out!

R41nMak3R · Accepted Answer · 2022-09-13 11:37:27Z

Alternate solution, although Josh's answer did point me in the right direction. Instead of using pivot, I'm using pd.pivot_table.

For the first dataframe that I needed, I used:

df1 = pd.pivot_table(df, index='pempid', columns='benid', values='ppbeid', dropna=False, fill_value='')

For the other two dataframes that I needed, I passed in additional parameters aggfunc (to sum up my rows), margins (to get a total column) and margins_name (the header for the total column). I did this separately for both initpen and incpen by changing the values parameter:

df_initpen = pd.pivot_table(df, index='pempid', columns='benid', values='initpen', dropna=False, fill_value=0, aggfunc='sum', margins=True, margins_name='Total')

This Python Pivot Tables Tutorial - YouTube video has more details on how to use pd.pivot_table and helped me arrive to this solution.

Collectives™ on Stack Overflow

Create 3 dataframes from original dataframe using existing values

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related