drop duplicates dataframe pandas

Question

I have a dataframe where I want to sum up all "Hours" (column header) into "total sum" for each "Name" (column header) under 1 "Manager" (column header). I then want to drop all duplicates before sorting the the dataframe based on the total hours sum and print out row by row. However I keep getting duplicates of the Manager row by row print out?

|---------------------|------------------|---------------------|------------------|
|      Department     |     Name         | Manager             | Hours            | 
|---------------------|------------------|---------------------|------------------|
|   Department name   |     person Name  | Manager Name        |no of hours       |
|---------------------|------------------|---------------------|------------------|

def total_group(csv_file):
    df = pd.read_csv(csv_file)
    df['Total Hours'] = df.groupby(['Manager'])['Hours'].transform('sum')
    new_df = df.drop_duplicates(subset=['Department', 'name', 'Manager']).sort_values('Total Hours')
    for index, row in new_df.iterrows():
        manager_value = row['Manager']
        total_hours = row['Total Hours']
        print("manager: {}, has: {} Total hours".format(manager_value, total_hours))


print(total_group(csv_file))

Dataframe print

df1 = df['Total Hours'] = df.groupby(['Direct Manager'])['Labor Hours'].transform('sum')
    print(df1)

result

0        450.0
1        450.0
2        450.0
3        450.0
4        450.0
         ...  
43929    320.5
43930    320.5
43931    320.5
43932    320.5
43933    320.5
Name: Hours, Length: 43934, dtype: float64

new dataframe print:

new_df = df.drop_duplicates(subset=['Department', 'Direct Manager']).sort_values('Total Hours')
    print(new_df)

Result:

                     Department              Name                Hours                   Total Hours
9554             Europe                     Dri, Bas ...         8.0                        72.000000
34498           Product & Design    Sun, Sunn  ...     5.0                        81.000000
19140           Product & Design    Oers, Len  ...      8.0                        122.000000

what I would like is a dataframe like this:

                     Department              Manager                                Total Hours
9554             Europe                     Last, First ...                             72.000000
34498           Product                    Last, first  ...                         81.000000
19140           Design                     Last, First  ...                          122.000000

what about groupby(['Manager','Name'])['Hours']..would be helpful if you had an example df and desired output — Derek Eden
– Derek Eden, Commented Nov 20, 2019 at 23:57

kcEmenike · Accepted Answer · 2019-11-21 15:36:23Z

1

Do you want to try this

df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)

edited Nov 21, 2019 at 15:36

answered Nov 21, 2019 at 0:43

kcEmenike

1721 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

new Q Open Wid Over a year ago

How would it be helpful?

kcEmenike Over a year ago

grouping by 'Manager' means there will no 'Manager' duplicates. Then you can apply the agg method with specific aggregation functions (in this case, sum and count). Then sort by the sum/Hours multi-index df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)

JRM Over a year ago

let me try it out

JRM Over a year ago

df.groupby('Manager').sum()['Hours'] give me "nan" in total hours

Collectives™ on Stack Overflow

drop duplicates dataframe pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related