0

I have the following dataframe which I'm wanting to create 3 new dataframes from using the values in specific columns (ppbeid, initpen and incpen) and using the unique entries in the benid and id columns:

enter image description here

In part of my code, below, I'm using a list of unique items in the benid column, then removing any blanks from the list. This will give me some of the column headers I want in the new dataframes, but I also want the unique ids in the id column, the aforementioned list of unique benid items and, for a couple of the new dataframes, a total column too (see last 3 Excel screenshots):

lst_benids = df_ppbens.benid.unique()
lst_benids = list(filter(None, lst_benids))

# Result is: ['PENSION', 'POST', 'PRE8', 'SPOUSE', 'RULE29', 'MOD']

I know how to achieve this in Excel using Index/Match/Match, but it's long-winded and I really want to learn how to do this in Pandas. The output should be the following 3 dataframes (which I'll then export to Excel in different worksheets):

First dataframe should be what the ppbeid column entry is, for the corresponding benid field, listed by the unique ids:

enter image description here

The second dataframe should be the initpen figures for those unique ids are and the specific corresponding benid, with a total column at the end:

enter image description here

The third and final dataframe is the same as above but instead it's got the incpen column figures for corresponding benids and a total column at the end:

enter image description here

Any help is much appreciated and it will help me learn something I have to do manually in Excel a lot. Being new to Pandas/Python, I'm finding it confusing navigating the documents and other resources online. Thanks

4
  • Please don't post pictures of your (code or) data Commented Sep 12, 2022 at 20:10
  • Sorry, I'm trying (unsuccessfully) to figure out how to put the first dataframe as code. Commented Sep 12, 2022 at 20:15
  • 1
    It's in my answer, at the beginning Commented Sep 12, 2022 at 20:18
  • 1
    Thank you Josh, I've got so much to learn about Pandas and StackOverflow (next time, I'll work out how to recreate the original dataframe - my sincerest apologies!) I really appreciate you taking the time to help. Now, I can't wait to try this solution at work in the morning (UK time!) :) Commented Sep 12, 2022 at 20:22

2 Answers 2

1

It seems like what you want can be done with the pivot method, which is similar to Excel's pivot table.

First let's set up the data:

df = pd.DataFrame(
    {
        "id": [92, 92, 133, 133, 133, 705, 705, 705, 588, 588],
        "initpen": [0] * 8 + [606.32, 1559.39],
        "incpen": [963.18, 462, 886.08, 529.32, 609.6, 0, 0, 0, 624.52, 1635.8],
        "benid": ["PENSION", "POST", "PRE8", "PENSION", "POST", "POST", "PRE8", "PENSION", "POST", "PENSION",],
        # I got tired of typing out the whole numbers...
        "ppbeid": [6197, 6197, 61990, 61998, 61990, 828, 828, 828, 8289, 8289],
    }
)

Then you can simply do:

df1 = df.pivot(index='id', columns='benid', values='ppbeid')

And for the others, substitute the appropriate variable name for ppbeid.

The to add the total just do:

df1['Total'] = df1.sum(1)
Sign up to request clarification or add additional context in comments.

4 Comments

Hi Josh, it won't let me pivot as there are duplicates in the id column. Would it work if I used the default index as the index?
I think you'd get a different result. But in my example there are also duplicates in the ID column, and it wasn't a problem. What exactly is the error message you get?
Thanks to your solution (keeping it as the answer), I found an alternate way to pivot my dataframe, using pd.pivot_table and its parameters :)
didn't know about that one. glad you figured it out!
1

Alternate solution, although Josh's answer did point me in the right direction. Instead of using pivot, I'm using pd.pivot_table.

For the first dataframe that I needed, I used:

df1 = pd.pivot_table(df, index='pempid', columns='benid', values='ppbeid', dropna=False, fill_value='')

For the other two dataframes that I needed, I passed in additional parameters aggfunc (to sum up my rows), margins (to get a total column) and margins_name (the header for the total column). I did this separately for both initpen and incpen by changing the values parameter:

df_initpen = pd.pivot_table(df, index='pempid', columns='benid', values='initpen', dropna=False, fill_value=0, aggfunc='sum', margins=True, margins_name='Total')

This Python Pivot Tables Tutorial - YouTube video has more details on how to use pd.pivot_table and helped me arrive to this solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.