1

I have multiple files .csv, containing the results of training/validating process. One file per model. Each line in a file contains the following information: Epoch,loss_train,acc_train,loss_val,acc_val,time. Each filename contains information on model parameters.

I need to construct a dataframe that contains the last line of each file and the filename. Using readlines() in Python, I managed to get a list of strings with the desired information. Example of the part of the list:

'"0_0_2_200_0.4.csv",66,67,0.42319968342781067,0.8733666720438781,0.9848468899726868,0.7532656023222061,0.2503340244293213\n', '"0_0_2_200_0.5.csv",74,75,0.41233333945274353,0.8760283916760768,0.9206098318099976,0.7656023222060958,0.2535388469696045\n',

How can I put this list into dataframe?

To transform this list of lists into dataframe, I tried:

df = pd.DataFrame.from_records(results, columns = ["filename", "row_number", "Epoch", "loss_train", "acc_train","loss_val", "acc_val", "time"])

I've got the following error: ValueError: 8 columns passed, passed data had 124 columns. I tried:

df2 = pd.DataFrame(results,
                   columns=["filename", "row_number", "Epoch", "loss_train", "acc_train","loss_val", "acc_val", "time"])

The error is: ValueError: Shape of passed values is (110, 1), indices imply (110, 8).

df3 = pd.DataFrame(results)

results into a dataframe with only one column. I tried to write the list into a csv file:

file = open('final_results.csv', 'w+', newline ='') 
with file:     
    write = csv.writer(file, delimiter=',') 
    write.writerows(results)

But in the resulting file, all the characters are split into different columns:

"""",0,_,0,_,2,_,2,0,0,_,0,.,4,.,c,s,v,"""",",",6,6,",",6,7,",",0,.,4,2,3,1,9,9,6,8,3,4,2,7,8,1,0,6,7,",",0,.,8,7,3,3,6,6,6,7,2,0,4,3,8,7,8,1,",",0,.,9,8,4,8,4,6,8,8,9,9,7,2,6,8,6,8,",",0,.,7,5,3,2,6,5,6,0,2,3,2,2,2,0,6,1,",",0,.,2,5,0,3,3,4,0,2,4,4,2,9,3,2,1,3,"

1 Answer 1

1

With the list of strings you provided:

results = [
    '"0_0_2_200_0.4.csv",66,67,0.42319968342781067,0.8733666720438781,0.9848468899726868,0.7532656023222061,0.2503340244293213\n',
    '"0_0_2_200_0.5.csv",74,75,0.41233333945274353,0.8760283916760768,0.9206098318099976,0.7656023222060958,0.2535388469696045\n',
]

Here is one way to do it using Python str.strip and str.split:

import pandas as pd

df = pd.DataFrame(
    [x.strip("\n").split(",") for x in results],
    columns=[
        "filename",
        "row_number",
        "Epoch",
        "loss_train",
        "acc_train",
        "loss_val",
        "acc_val",
        "time",
    ],
)

Then:

              filename row_number Epoch           loss_train  \
0  "0_0_2_200_0.4.csv"         66    67  0.42319968342781067   
1  "0_0_2_200_0.5.csv"         74    75  0.41233333945274353   

            acc_train            loss_val             acc_val  \
0  0.8733666720438781  0.9848468899726868  0.7532656023222061   
1  0.8760283916760768  0.9206098318099976  0.7656023222060958   

                 time  
0  0.2503340244293213  
1  0.2535388469696045
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.