0

I have one csv file with thousands of rows , below is example. What I need to do is :

  1. Iterate over the pandas df
  • create separate file for every department with list of staff(groupby or .loc)
  • To name the file : If manager is found then name the file with his name if no manager , then look for officer if no manager or officer , then write a comment in the last column the file should be named as this( IT-name2.csv) as name2 staff is manager in the below picture/example.

So I have two variables , dept. name and staffname

I was able to do this but there is lots of manual work , it should not be the case. I name every csv file myself , I have 100s of lines and I add the managername in the csv filename myself , which caused some errors and it could be changed in the future

Now , for every department I have groupby line, and line to save the csv file (manually enter the managername in the filename)

How this can be more automated ?

Many thanks
.

enter image description here

2
  • Use loc and then write to csv. Send the Role tolist(). If you give more details I can code it up. Commented Jul 21, 2020 at 14:11
  • @sid , what details can i add to help you understand it ? I actually shared a sample of the file that I have ... its the same concept .. Commented Jul 21, 2020 at 14:12

2 Answers 2

2

It sounds like you're nearly there with the groupby. How about adding a custom function to modify the csv name depending on what you find in the groupby?

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(data={
    "Dept"    : np.random.choice(["IT", "HR", "Sales"], 20),
    "Staff"   : ["name" + num for num in np.random.randint(0,5,20).astype(str)],
    "Role"    : np.random.choice(a=["Manager", "Officer", "Admin"], size=20, p=[0.1, 0.3, 0.6]),
    "Comment" : [None] * 20
})

def to_csv(group):
    
    roles = group["Role"].tolist()
    dept = group["Dept"].iloc[0]
    staff_name = "NotFound"
    
    if "Manager" in roles:
        staff_name = group["Staff"].iloc[roles.index("Manager")]
    elif "Officer" in roles:
        staff_name = group["Staff"].iloc[roles.index("Officer")]
        
    group.to_csv(f"{dept}-{staf_name}.csv", index=False)

df.groupby("Dept").apply(to_csv)

list().index() will return the position of the first match which you can use to grab the name in that position from the group. It might not be the fastest thing in the world, but hopefully will get the job you have in mind done.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi .. Your response works perfect .. I Just need small modification if you don't mind , the file name should be (dept-staffname.csv) so I need to add dept name as well ..Also , why did you use .apply ? Many thanks
.apply is just a convenient way to access each "mini-dataframe" filtered to a department name - this "mini-dataframe" gets passed in as argument to the function you supply to .apply. I also edited the f-string to include department.
1

I wasn't able to understand the complete requirement, posting an answer which should help:

Get a list of unique departments:

dept_list = list(set(df['Dept.'].tolist()))

Now we want to run through the unique only department list and do some manipulation of the dataframe:

for dept in dept_list:
    sub_df = df.loc[df['Dept.'] == dept]
    # We want to send this to a file. The file name should be dept-officer/manager/other name.csv
    # Check if manager exists in sub_df['Role']
    if 'Manager' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Manager'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
    elif 'Officer' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Officer'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))

7 Comments

hi thank you for your response .. I think it will work but something is wrong .. No csv files were created .. and what does sub_df = df[df['Dept.'] == dept] do ? I printed to see ... it only has one Dept listed ... can you please explain ?
Try now, I added a .loc . Basic we are splitting the dataframe into a sub dataframe which just has one dept rows.
Still, no files created , no error , but no csv files created @sid
Are you checking in the right directory? See the working directory. If the code ran then the should be made. Add a print in the if and else statement to check if it is getting to the to_csv part of the code.
Yes i am sure , i am using Jupyter any output file will be in Home page.. i have a doubt of one thing i will check in the morning ..stay connected plz will confirm tmw ..many thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.