I have a large csv file, and suppose that it looks like this
ID,PostCode,Value
H1A0A1-00,H1A0A1,0
H1A0A1-01,H1A0A1,0
H1A0A1-02,H1A0A1,0
H1A0A1-03,H1A0A1,0
H1A0A1-04,H1A0A1,1
H1A0A1-05,H1A0A1,0
H1A1G7-0,H1A1G7,0
H1A1G7-1,H1A1G7,0
H1A1G7-2,H1A1G7,0
H1A1N6-00,H1A1N6,0
H1A1N6-01,H1A1N6,0
H1A1N6-02,H1A1N6,0
H1A1N6-03,H1A1N6,0
H1A1N6-04,H1A1N6,0
H1A1N6-05,H1A1N6,0
...
I want to split it up by PostCode values and save all rows with the same postal code as a CSV. I have tried
postals = data['PostCode'].unique()
for p in postals:
df = data[data['PostCode'] == p]
df.to_csv(directory + '/output/demographics/' + p + '.csv', header=False, index=False)
Is there a way to do this using Dask to leverage multiprocessing? Thanks