Working with Pandas DataFrame / Sorting

Question

I'm working with a big dataset within Excel in which I'm trying to sort a number by top 25 per index value.

The datasite looks like this:

Example

The Final PAC ID is the company number and changes (this does not show in the given data). The PAC contribution is the number I want to sort by.

So for example, there will be 50 contributions done by company C00003590, to different candidates with amount 'PAC contribution', I would like to sort the top 25 contributions done per company.

I've tried working with dictionaries, creating a dictionary for each company and adding in the candidate numbers as a string key, and the contribution as a value.

The code I have so far is the following (this might be the completely wrong way to go about it though):

import pandas as pd

df1 = pd.read_excel('Test2.xlsx')

dict_company = {}
k1 = str(df1['Final PAC ID'])
k2 = str(df1['Candidate ID'])

for each in range(0,100):
    dict_company[k1)[each]] = {}
    dict_company[k1)[each]] = k2[each]
    if each % 50 == 0:
        print(each)

print(dict_company)

for each in range(0,100):
    dict_company[k1][k2][each] = round(float(k1[each]))
    if each % 50:
        print(each)

print(dict_company)

jezrael · Accepted Answer · 2017-03-30 11:55:06Z

2

I think you need nlargest:

df1 = df.groupby('Final PAC ID')['PAC contribution'].nlargest(50)

If need all columns:

cols = df.columns[~df.columns.isin(['PAC contribution','Final PAC ID'])].tolist()
df1 = df.set_index(cols)
         .groupby('Final PAC ID')['PAC contribution']
         .nlargest(50)
         .reset_index()

Another solution (can be slowier):

df1 = df.sort_values('PAC contribution', ascending=False).groupby('Final PAC ID').head(50)

Last save to excel by to_excel:

df1.to_excel('filename.xlsx')

edited Mar 30, 2017 at 11:55

answered Mar 30, 2017 at 11:23

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Menno Van Dijk Over a year ago

This is exactly what I needed! How would I go about saving this back to the excel file though? it does not seem to change the dataframe itself if I remove the print command.

Menno Van Dijk Over a year ago

Ah right, I tried doing the same thing but isntead of making a new dataframe I tried to override the current dataframe. Thank you very much for the answer though, helps a lot.

Filipe Aleixo · Accepted Answer · 2017-03-30 11:25:12Z

0

df.groupby('Final PAC ID').head(50).reset_index(drop=True)

answered Mar 30, 2017 at 11:25

Filipe Aleixo

4,2927 gold badges51 silver badges82 bronze badges

Comments

pansen · Accepted Answer · 2017-03-30 11:27:43Z

0

You can use groupby in conjunction with a dictionary comprehension here. The result is a dictionary containing your company names as keys and the sub dataframes with top 25 payments as values:

def aggregate(sub_df):
    return sub_df.sort_values('PAC contribution', ascending=False).head(25)

grouped = df.groupby('Final PAC ID')
results = {company: aggregate(sub_df)
           for company, sub_df in grouped}

answered Mar 30, 2017 at 11:27

pansen

6,7034 gold badges21 silver badges33 bronze badges

Collectives™ on Stack Overflow

Working with Pandas DataFrame / Sorting

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related