3

I'm working with a big dataset within Excel in which I'm trying to sort a number by top 25 per index value.

The datasite looks like this:

Example

The Final PAC ID is the company number and changes (this does not show in the given data). The PAC contribution is the number I want to sort by.

So for example, there will be 50 contributions done by company C00003590, to different candidates with amount 'PAC contribution', I would like to sort the top 25 contributions done per company.

I've tried working with dictionaries, creating a dictionary for each company and adding in the candidate numbers as a string key, and the contribution as a value.

The code I have so far is the following (this might be the completely wrong way to go about it though):

import pandas as pd

df1 = pd.read_excel('Test2.xlsx')

dict_company = {}
k1 = str(df1['Final PAC ID'])
k2 = str(df1['Candidate ID'])

for each in range(0,100):
    dict_company[k1)[each]] = {}
    dict_company[k1)[each]] = k2[each]
    if each % 50 == 0:
        print(each)

print(dict_company)

for each in range(0,100):
    dict_company[k1][k2][each] = round(float(k1[each]))
    if each % 50:
        print(each)

print(dict_company)

3 Answers 3

2

I think you need nlargest:

df1 = df.groupby('Final PAC ID')['PAC contribution'].nlargest(50)

If need all columns:

cols = df.columns[~df.columns.isin(['PAC contribution','Final PAC ID'])].tolist()
df1 = df.set_index(cols)
         .groupby('Final PAC ID')['PAC contribution']
         .nlargest(50)
         .reset_index()

Another solution (can be slowier):

df1 = df.sort_values('PAC contribution', ascending=False).groupby('Final PAC ID').head(50)

Last save to excel by to_excel:

df1.to_excel('filename.xlsx')
Sign up to request clarification or add additional context in comments.

2 Comments

This is exactly what I needed! How would I go about saving this back to the excel file though? it does not seem to change the dataframe itself if I remove the print command.
Ah right, I tried doing the same thing but isntead of making a new dataframe I tried to override the current dataframe. Thank you very much for the answer though, helps a lot.
0
df.groupby('Final PAC ID').head(50).reset_index(drop=True)

Comments

0

You can use groupby in conjunction with a dictionary comprehension here. The result is a dictionary containing your company names as keys and the sub dataframes with top 25 payments as values:

def aggregate(sub_df):
    return sub_df.sort_values('PAC contribution', ascending=False).head(25)

grouped = df.groupby('Final PAC ID')
results = {company: aggregate(sub_df)
           for company, sub_df in grouped}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.