How can I export multiple pandas dataframe into one .csv file?

Question

I have written a piece of code that would read in one .fasta file, analyze a single genetic sequence, make calculations based on said sequence, and then organize the calculation results into a single pandas dataframe, which would subsequently be exported as .csv file.

I have updated the code recently in order for it to parse a .fasta file that contains multiple sequences, and although I figured out how to do it, the code in its current form exports one .csv file per sequence. When the .fasta file contains many sequences (over 100, for example), having to sort through so many .csv files might be somewhat laborious.

So instead I am trying to have each of the pandas dataframes be exported in a single .csv file instead. However, I am not sure how to set up code in order to have this occur. Right now, the code is based around a for loop that iterates over values of a dict (where the sequences from the .fasta file are stored). In each iteration, a function is called that creates a dict full of the the pertinent calculation results, and another function is called that creates pandas dataframe and fills it with the information from the dict, which is then exports as a .csv file.

import pandas as pd
from os import path

for seq in seq_dict.keys():
    result_dict= calculator_func(seq_dict[seq])
    results_df= data_assembler(result_dict)
    results_df.to_csv(path.join(output_dir, "{}_dataframe.csv".format(project_name)

It should also be noted that the indices of the dataframes are all based on the numerical positions within the relevant sequence.

In any case, I am having a hard time trying to figure out exactly how I should conglomerate all the dataframes into one .csv file such that indices make it possible for the user to tell a. from which sequence the row is from and b. at which position within the sequence the row is based on. Can anybody recommend me a some kind of approach?

sbonin · Accepted Answer · 2017-10-27 19:04:45Z

1

You can set your index as whatever you want, including a string. Try this example:

import pandas as pd

test_frame = pd.DataFrame({"Sequence":[1,2],"Position":[3,4]})
test_frame.index = "Sequence:" + test_frame['Sequence'].astype(str) + "_" + "Position:" + test_frame['Position'].astype(str)
test_frame

answered Oct 27, 2017 at 19:04

sbonin

365 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bob McBobson Over a year ago

I know that you can set the index on a dataframe to being whatever you want, but how can I export all the dataframes produced by the for loop into one single .csv file? Should I create an empty data frame before the loop, and then fill it up with each for loop? How then should I structure the indices? Tell me if you need to tell you what the structure of my functions are.

sbonin Over a year ago

Do all of the dataframes have the same column names? After you assign the new indices, you can append to a giant master dataframe or concatenate them into a master dataframe and export. For example: master_frame = test_frame1 Followed by: master_frame = master_frame.append(test_frame2) pandas.pydata.org/pandas-docs/stable/merging.html

Bob McBobson Over a year ago

I ended up merging the dataframes by making an empty list before the for loop, appending each built dataframe to the list, and then using final_dataframe = pd.concat(total_list_of_dfs) to make the final dataframe. Thanks for your help!

sbonin Over a year ago

Awesome, hope I helped!

Collectives™ on Stack Overflow

How can I export multiple pandas dataframe into one .csv file?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related