Splitting Dataframe based on duplicate values into multiple csv files

Question

I have a dataset with multiple columns but only focusing on one column called 'VAL'. Every value in this column ranges from 0 to 4 so I would like to split this into 5 separate data frames based on those duplicate values and then export each of these data frames into individual csv files.

I have been able to sort the numbers using pandas but now I need to divide up the values into smaller datasets keeping in mind that I have multiple files I would like to do this to so possibly a for loop?

this is what I currently have as an output

 A       B      C      D      E      F      G         VAL   FILE
954     380    158    166    431    201    769         0  001.csv
1142    348    203    962      0    878   1023         0  001.csv
1688    279    229      0    488   1007      0         0  001.csv
4792    371    420     29    372      0    745         0  001.csv
2106    352     76    196    388      0    695         0  001.csv
    ...    ...    ...    ...    ...    ...       ...      ...
5634    441    283    277    788     45    585         4  001.csv
827     672    606     24   1023    463    742         4  001.csv
6703    324    203      0    623    214    726         4  001.csv
9056    604    398      0    981      0    633         4  001.csv
0       574    338    144    942    608    793         4  001.csv

this is what I would like it to relatively look like

 A       B      C      D      E      F      G         VAL   FILE
954     380    158    166    431    201    769         0  val_0.csv
1142    348    203    962      0    878   1023         0  val_0.csv
1688    279    229      0    488   1007      0         0  val_0.csv
4792    371    420     29    372      0    745         0  val_0.csv
2106    352     76    196    388      0    695         0  val_0.csv


 A       B      C      D      E      F      G         VAL   FILE
5634    441    283    277    788     45    585         4  val_4.csv
827     672    606     24   1023    463    742         4  val_4.csv
6703    324    203      0    623    214    726         4  val_4.csv
9056    604    398      0    981      0    633         4  val_4.csv
0       574    338    144    942    608    793         4  val_4.csv

Does this answer your question? Save grouped by results into separate CSV files — AMC
– AMC, Commented Dec 15, 2019 at 1:44

Umar.H · Accepted Answer · 2020-11-27 09:10:13Z

3

change your FILE to match your expected output.

df = pd.read_clipboard(sep'\s+')

then groupby VAL and write your csv

for group,data in df.groupby('VAL'):
    data.to_csv(f"val_{group}.csv",index=False)

this writes two csv's for me from your data.

for group,data in df.groupby('VAL'):
    print(data)
          A    B    C    D    E     F     G VAL       FILE
0   954  380  158  166  431   201   769   0  val_0.csv
1  1142  348  203  962    0   878  1023   0  val_0.csv
2  1688  279  229    0  488  1007     0   0  val_0.csv
3  4792  371  420   29  372     0   745   0  val_0.csv
4  2106  352   76  196  388     0   695   0  val_0.csv
       A    B    C    D     E    F    G VAL       FILE
6   5634  441  283  277   788   45  585   4  val_4.csv
7    827  672  606   24  1023  463  742   4  val_4.csv
8   6703  324  203    0   623  214  726   4  val_4.csv
9   9056  604  398    0   981    0  633   4  val_4.csv
10     0  574  338  144   942  608  793   4  val_4.csv

edited Nov 27, 2020 at 9:10

answered Dec 15, 2019 at 1:58

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

AMC Over a year ago

Isn’t that f-string in the loop unnecessary? Wouldn’t y.to_csv(f'val_{x}.csv', index=False) do the trick, instead of creating a new column etc?

AMC Over a year ago

I’m not sure what flexibility there is to gain? In any case, aren’t the .tolist() and .unique() unnecessary?

AMC Over a year ago

Also even if OP does need the whole DataFrame to generate the file names, how does creating entire column make sense when, in the end, we only need a single file name per group?

AMC Over a year ago

I just realized that the unique() and indexing are directly related to my second comment, so my first one doesn’t make much sense on its own lol

AMC Over a year ago

I think it’s safe to assume that the FILE column was just created as part of an attempt to solve this, no? I have a difficult time imagining a situation where that would be the best solution. As for the f-string, I was referring to the fact that in the loop, I believe file_name is already a string. Even if it isn’t a string, a simple str() call should suffice.

|

AMC · Accepted Answer · 2019-12-15 03:51:22Z

2

Here an example , where column C is like your column VAL:

from io import StringIO

import pandas as pd

data = """
A,B,C
5d8b,N,1
5d8b,A,1
5d8b,B,2
5d8b,C,2
5d8b,Y,3
5d8b,X,3
"""

df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby('C'):
    group.to_csv(f'df_{key}.csv', index=False)

edited Dec 15, 2019 at 3:51

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered Dec 15, 2019 at 1:31

GiovaniSalazar

2,1042 gold badges11 silver badges16 bronze badges

Collectives™ on Stack Overflow

Splitting Dataframe based on duplicate values into multiple csv files

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related