2

I have a dataset with multiple columns but only focusing on one column called 'VAL'. Every value in this column ranges from 0 to 4 so I would like to split this into 5 separate data frames based on those duplicate values and then export each of these data frames into individual csv files.

I have been able to sort the numbers using pandas but now I need to divide up the values into smaller datasets keeping in mind that I have multiple files I would like to do this to so possibly a for loop?

this is what I currently have as an output

 A       B      C      D      E      F      G         VAL   FILE
954     380    158    166    431    201    769         0  001.csv
1142    348    203    962      0    878   1023         0  001.csv
1688    279    229      0    488   1007      0         0  001.csv
4792    371    420     29    372      0    745         0  001.csv
2106    352     76    196    388      0    695         0  001.csv
    ...    ...    ...    ...    ...    ...       ...      ...
5634    441    283    277    788     45    585         4  001.csv
827     672    606     24   1023    463    742         4  001.csv
6703    324    203      0    623    214    726         4  001.csv
9056    604    398      0    981      0    633         4  001.csv
0       574    338    144    942    608    793         4  001.csv

this is what I would like it to relatively look like

 A       B      C      D      E      F      G         VAL   FILE
954     380    158    166    431    201    769         0  val_0.csv
1142    348    203    962      0    878   1023         0  val_0.csv
1688    279    229      0    488   1007      0         0  val_0.csv
4792    371    420     29    372      0    745         0  val_0.csv
2106    352     76    196    388      0    695         0  val_0.csv


 A       B      C      D      E      F      G         VAL   FILE
5634    441    283    277    788     45    585         4  val_4.csv
827     672    606     24   1023    463    742         4  val_4.csv
6703    324    203      0    623    214    726         4  val_4.csv
9056    604    398      0    981      0    633         4  val_4.csv
0       574    338    144    942    608    793         4  val_4.csv

1

2 Answers 2

3

change your FILE to match your expected output.

df = pd.read_clipboard(sep'\s+')

then groupby VAL and write your csv

for group,data in df.groupby('VAL'):
    data.to_csv(f"val_{group}.csv",index=False)

this writes two csv's for me from your data.

enter image description here

for group,data in df.groupby('VAL'):
    print(data)
          A    B    C    D    E     F     G VAL       FILE
0   954  380  158  166  431   201   769   0  val_0.csv
1  1142  348  203  962    0   878  1023   0  val_0.csv
2  1688  279  229    0  488  1007     0   0  val_0.csv
3  4792  371  420   29  372     0   745   0  val_0.csv
4  2106  352   76  196  388     0   695   0  val_0.csv
       A    B    C    D     E    F    G VAL       FILE
6   5634  441  283  277   788   45  585   4  val_4.csv
7    827  672  606   24  1023  463  742   4  val_4.csv
8   6703  324  203    0   623  214  726   4  val_4.csv
9   9056  604  398    0   981    0  633   4  val_4.csv
10     0  574  338  144   942  608  793   4  val_4.csv
Sign up to request clarification or add additional context in comments.

6 Comments

Isn’t that f-string in the loop unnecessary? Wouldn’t y.to_csv(f'val_{x}.csv', index=False) do the trick, instead of creating a new column etc?
I’m not sure what flexibility there is to gain? In any case, aren’t the .tolist() and .unique() unnecessary?
Also even if OP does need the whole DataFrame to generate the file names, how does creating entire column make sense when, in the end, we only need a single file name per group?
I just realized that the unique() and indexing are directly related to my second comment, so my first one doesn’t make much sense on its own lol
I think it’s safe to assume that the FILE column was just created as part of an attempt to solve this, no? I have a difficult time imagining a situation where that would be the best solution. As for the f-string, I was referring to the fact that in the loop, I believe file_name is already a string. Even if it isn’t a string, a simple str() call should suffice.
|
2

Here an example , where column C is like your column VAL:

from io import StringIO

import pandas as pd

data = """
A,B,C
5d8b,N,1
5d8b,A,1
5d8b,B,2
5d8b,C,2
5d8b,Y,3
5d8b,X,3
"""

df = pd.read_csv(StringIO(data), sep=',')

for key, group in df.groupby('C'):
    group.to_csv(f'df_{key}.csv', index=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.