I have written a piece of code that would read in one .fasta file, analyze a single genetic sequence, make calculations based on said sequence, and then organize the calculation results into a single pandas dataframe, which would subsequently be exported as .csv file.
I have updated the code recently in order for it to parse a .fasta file that contains multiple sequences, and although I figured out how to do it, the code in its current form exports one .csv file per sequence. When the .fasta file contains many sequences (over 100, for example), having to sort through so many .csv files might be somewhat laborious.
So instead I am trying to have each of the pandas dataframes be exported in a single .csv file instead. However, I am not sure how to set up code in order to have this occur. Right now, the code is based around a for loop that iterates over values of a dict (where the sequences from the .fasta file are stored). In each iteration, a function is called that creates a dict full of the the pertinent calculation results, and another function is called that creates pandas dataframe and fills it with the information from the dict, which is then exports as a .csv file.
import pandas as pd
from os import path
for seq in seq_dict.keys():
result_dict= calculator_func(seq_dict[seq])
results_df= data_assembler(result_dict)
results_df.to_csv(path.join(output_dir, "{}_dataframe.csv".format(project_name)
It should also be noted that the indices of the dataframes are all based on the numerical positions within the relevant sequence.
In any case, I am having a hard time trying to figure out exactly how I should conglomerate all the dataframes into one .csv file such that indices make it possible for the user to tell a. from which sequence the row is from and b. at which position within the sequence the row is based on. Can anybody recommend me a some kind of approach?