I'm currently in a project and I need to work with a lot of CSV files, which are filled with data something like this:
CSV1.csv
A B C D ...
1 1980 1 0.9 0.8
2 2003 0.9 0.8 0.2
3 1665 0.7 0.2 0.4
4 1982 0.6 1 0.2
...
CSV2.csv
A E F G ...
1 1665 1 0.4 1
2 1980 0.4 0.8 0.6
3 2003 0.1 0.3 0.9
4 1982 0.3 1 0.6
...
All of the CSV files have the same values in the A column, but are disorganized. I am importing all the files like this:
path = r"/Users/.../folder/"
all_files = glob.glob(path + "/*.CSV")
all_csv = (pd.read_csv(f, sep=',') for f in all_files)
df_merged = pd.concat(all_csv, axis=1, ignore_index=False)
It gets merged, but the dataframe is disorganized.
This way is not correct to sort with df_merged.sort() because there is not a column with the same order at A. I know that I can manually import each one of the csv files and apply a .sort(), but those are 394 csv files...
I feel like something like this might be applicable in a large import of CSV files, but I don't know how to call a code line before the dataframe combination gets made (all_csv is a generator object).
P.S. at the end I execute to eliminate repeated A columns:
df_merged = df_merged.loc[:, ~df_merged.columns.duplicated()]