1

I'm making a file compiler that takes data from csv files and after every 8 files read concatenates the data into a separate csv file. The program is taking quite long to be able to do these functions. is there a more optimized way to go about this?

I'm currently reading the csv file into a pandas data frame then appending said data frames into a list to compile them for pd.concat() after.

edit: The inputs used in the pd.read_csv call is the root's directory and the files name that's being read since im using os.walk to jump from folder to folder. The content in each of the folders is an inconsistent amount of csv files storing data for a model's MSE RMSE and MAE. the reason why im using a data frame is because im trying to use the data in each of the csv files for further data analysis(reason why it concatenates every 8 files is because each model has 8 outputs). All csv files have one row for a header and are 6 columns by 5 rows.

code snippet:

data = []

data_value = pd.read_csv(os.path.join(root, file), sep='\t') #Reading data into df

data.append(data_value) # appending df to a list

pd.concat(data) #concatenating all data in list into a data frame
4
  • 2
    It isn't clear what you are trying to accomplish exactly. Can you please describe exactly what your inputs are (a list of file paths?) and what is the output you are looking for (you are creating a dataframe in your code, but you said you want to ouput into a seperate csv, so are you just trying to aggregate every 8 files into 1 file on disk? or do you actually need a dataframe?). Is your only purpose of using pandas to read/write the csv, or are you actually using the dataframes? Do the csv files have identical structure? What is that structure, approximately (is there a header row? ) Commented Feb 25 at 19:23
  • Sorry, it's my first time posting here. I made an edit to the post for more information. Thanks for trying to help out! Commented Feb 25 at 21:54
  • Why do you need to use pandas at all? Just concatenate the files directly. The only complication may be filtering out the duplicate header lines. Commented Feb 25 at 22:04
  • I was going to use the mean and stdev functions in pandas for each column and make the new concatenated file have these values on the bottom of the table. Would it be better to just use shutil and read the files after into a data frame for this? Commented Feb 25 at 22:19

1 Answer 1

0

As stated by others, this question is too generic and doesn't provide much info about the issue. However, the best thing you can do is to simply read all files separately and concat them without creating said list like that and appending constantly.

df1 = pd.read_csv(path_to_file1, ...)
df2 = pd.read_csv(path_to_file2, ...)
df3 = pd.read_csv(path_to_file3, ...)
df4 = pd.read_csv(path_to_file4, ...)
df5 = pd.read_csv(path_to_file5, ...)
df6 = pd.read_csv(path_to_file6, ...)
df7 = pd.read_csv(path_to_file7, ...)
df8 = pd.read_csv(path_to_file8, ...)

df_final = pd.concat(
  [df1, df2, df3, df4, df5, df6, df7, df8],
  **kwargs
)

Or you could just concatenate 2 files per execution and store the resulting file and do it recursively until only two files are to concat. Note that, when I mean recursively, I don't mean coding a recursive function, since it would be too memory costly. Create a script to concat 2 files and store the result and then use that result as one of the dfs to concat in the next execution of the script.

Sign up to request clarification or add additional context in comments.

3 Comments

why would this be more efficient? The efficiency here would be the same. And also, "Create a script to concat 2 files and store the result and then use that result as one of the dfs to concat in the next execution of the script." that would be very inefficient. Don't do that
@juanpa.arrivillaga to be fair, bairly anything else could be said with what I ahd in hand. But I won't deny that you are right.
If the question is unclear, don't bother answering it, wait for them to improve the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.