Create many CSV Files based on Pandas df column value

Question

I have one csv file with thousands of rows , below is example. What I need to do is :

Iterate over the pandas df

create separate file for every department with list of staff(groupby or .loc)
To name the file : If manager is found then name the file with his name if no manager , then look for officer if no manager or officer , then write a comment in the last column the file should be named as this( IT-name2.csv) as name2 staff is manager in the below picture/example.

So I have two variables , dept. name and staffname

I was able to do this but there is lots of manual work , it should not be the case. I name every csv file myself , I have 100s of lines and I add the managername in the csv filename myself , which caused some errors and it could be changed in the future

Now , for every department I have groupby line, and line to save the csv file (manually enter the managername in the filename)

How this can be more automated ?

Many thanks
.

Use loc and then write to csv. Send the Role tolist(). If you give more details I can code it up. — Sid
– Sid, Commented Jul 21, 2020 at 14:11
@sid , what details can i add to help you understand it ? I actually shared a sample of the file that I have ... its the same concept .. — Eithar
– Eithar, Commented Jul 21, 2020 at 14:12

gherka · Accepted Answer · 2020-07-27 09:03:10Z

2

It sounds like you're nearly there with the groupby. How about adding a custom function to modify the csv name depending on what you find in the groupby?

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(data={
    "Dept"    : np.random.choice(["IT", "HR", "Sales"], 20),
    "Staff"   : ["name" + num for num in np.random.randint(0,5,20).astype(str)],
    "Role"    : np.random.choice(a=["Manager", "Officer", "Admin"], size=20, p=[0.1, 0.3, 0.6]),
    "Comment" : [None] * 20
})

def to_csv(group):
    
    roles = group["Role"].tolist()
    dept = group["Dept"].iloc[0]
    staff_name = "NotFound"
    
    if "Manager" in roles:
        staff_name = group["Staff"].iloc[roles.index("Manager")]
    elif "Officer" in roles:
        staff_name = group["Staff"].iloc[roles.index("Officer")]
        
    group.to_csv(f"{dept}-{staf_name}.csv", index=False)

df.groupby("Dept").apply(to_csv)

list().index() will return the position of the first match which you can use to grab the name in that position from the group. It might not be the fastest thing in the world, but hopefully will get the job you have in mind done.

edited Jul 27, 2020 at 9:03

answered Jul 21, 2020 at 14:59

gherka

1,45610 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eithar Over a year ago

Hi .. Your response works perfect .. I Just need small modification if you don't mind , the file name should be (dept-staffname.csv) so I need to add dept name as well ..Also , why did you use .apply ? Many thanks

gherka Over a year ago

.apply is just a convenient way to access each "mini-dataframe" filtered to a department name - this "mini-dataframe" gets passed in as argument to the function you supply to .apply. I also edited the f-string to include department.

Sid · Accepted Answer · 2020-07-27 20:38:49Z

1

I wasn't able to understand the complete requirement, posting an answer which should help:

Get a list of unique departments:

dept_list = list(set(df['Dept.'].tolist()))

Now we want to run through the unique only department list and do some manipulation of the dataframe:

for dept in dept_list:
    sub_df = df.loc[df['Dept.'] == dept]
    # We want to send this to a file. The file name should be dept-officer/manager/other name.csv
    # Check if manager exists in sub_df['Role']
    if 'Manager' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Manager'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
    elif 'Officer' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Officer'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))

edited Jul 27, 2020 at 20:38

answered Jul 22, 2020 at 1:57

Sid

4,0758 gold badges37 silver badges77 bronze badges

7 Comments

Eithar Over a year ago

hi thank you for your response .. I think it will work but something is wrong .. No csv files were created .. and what does sub_df = df[df['Dept.'] == dept] do ? I printed to see ... it only has one Dept listed ... can you please explain ?

Sid Over a year ago

Try now, I added a .loc . Basic we are splitting the dataframe into a sub dataframe which just has one dept rows.

Eithar Over a year ago

Still, no files created , no error , but no csv files created @sid

Sid Over a year ago

Are you checking in the right directory? See the working directory. If the code ran then the should be made. Add a print in the if and else statement to check if it is getting to the to_csv part of the code.

Eithar Over a year ago

Yes i am sure , i am using Jupyter any output file will be in Home page.. i have a doubt of one thing i will check in the morning ..stay connected plz will confirm tmw ..many thanks

|

Collectives™ on Stack Overflow

Create many CSV Files based on Pandas df column value

2 Answers 2

2 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related