0

I have a dataframe where I want to sum up all "Hours" (column header) into "total sum" for each "Name" (column header) under 1 "Manager" (column header). I then want to drop all duplicates before sorting the the dataframe based on the total hours sum and print out row by row. However I keep getting duplicates of the Manager row by row print out?

|---------------------|------------------|---------------------|------------------|
|      Department     |     Name         | Manager             | Hours            | 
|---------------------|------------------|---------------------|------------------|
|   Department name   |     person Name  | Manager Name        |no of hours       |
|---------------------|------------------|---------------------|------------------|
def total_group(csv_file):
    df = pd.read_csv(csv_file)
    df['Total Hours'] = df.groupby(['Manager'])['Hours'].transform('sum')
    new_df = df.drop_duplicates(subset=['Department', 'name', 'Manager']).sort_values('Total Hours')
    for index, row in new_df.iterrows():
        manager_value = row['Manager']
        total_hours = row['Total Hours']
        print("manager: {}, has: {} Total hours".format(manager_value, total_hours))


print(total_group(csv_file))

Dataframe print

df1 = df['Total Hours'] = df.groupby(['Direct Manager'])['Labor Hours'].transform('sum')
    print(df1)

result

0        450.0
1        450.0
2        450.0
3        450.0
4        450.0
         ...  
43929    320.5
43930    320.5
43931    320.5
43932    320.5
43933    320.5
Name: Hours, Length: 43934, dtype: float64

new dataframe print:

new_df = df.drop_duplicates(subset=['Department', 'Direct Manager']).sort_values('Total Hours')
    print(new_df)

Result:

                     Department              Name                Hours                   Total Hours
9554             Europe                     Dri, Bas ...         8.0                        72.000000
34498           Product & Design    Sun, Sunn  ...     5.0                        81.000000
19140           Product & Design    Oers, Len  ...      8.0                        122.000000

what I would like is a dataframe like this:

                     Department              Manager                                Total Hours
9554             Europe                     Last, First ...                             72.000000
34498           Product                    Last, first  ...                         81.000000
19140           Design                     Last, First  ...                          122.000000
3
  • what about groupby(['Manager','Name'])['Hours']..would be helpful if you had an example df and desired output Commented Nov 20, 2019 at 23:57
  • I can post that - give me some time Commented Nov 21, 2019 at 0:01
  • 1
    Ok I have tried adding what it looks like Commented Nov 21, 2019 at 15:32

1 Answer 1

1

Do you want to try this

df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)

Sign up to request clarification or add additional context in comments.

4 Comments

How would it be helpful?
grouping by 'Manager' means there will no 'Manager' duplicates. Then you can apply the agg method with specific aggregation functions (in this case, sum and count). Then sort by the sum/Hours multi-index df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)
let me try it out
df.groupby('Manager').sum()['Hours'] give me "nan" in total hours

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.