I am basically trying to concatenate many csv files of time series type in Pandas I was certain I knew how to do it but apparently I don't.
Consider this simple example:
We have three csv files:
file1.csv=
Alfa, A1, A2, A3, A4
Beta, B1, B2, B3, B4
Gamma, C1, C2, C3, C4
file2.csv=
Delta, D1, D2, D3, D4
Epsilon, E1, E2, E3, E4
Zeta, F1, F2, F3, F4
file3.csv=
Eta, G1, G2, G3, G4
Theta, H1, H2, H3, H4
Iota, I1, I2, I3, I4
1st toy code:
[In]:
import pandas as pd
a = pd.read_csv("/Test/file1.csv", header=None)
b = pd.read_csv("/Test/file2.csv", header=None)
c = pd.read_csv("/Test/file3.csv", header=None)
colnames = ["GREEK", "ROW1", "ROW2", "ROW3", "ROW4"]
concatDf = pd.concat([a,b,c], axis=0)
concatDf.columns = colnames
print(concatDf)
[Out]:
GREEK ROW1 ROW2 ROW3 ROW4
0 Alfa A1 A2 A3 A4
1 Beta B1 B2 B3 B4
2 Gamma C1 C2 C3 C4
0 Delta D1 D2 D3 D4
1 Epsilon E1 E2 E3 E4
2 Zeta F1 F2 F3 F4
0 Eta G1 G2 G3 G4
1 Theta H1 H2 H3 H4
2 Iota I1 I2 I3 I4
Not pretty, but does what it is suppose to do. Since the names of the real CSV-file are very long and the number of files are high, I thought I should make things a bit easier for me. Hence 2nd toy code:
[In]:
import glob
import pandas as pd
allFiles = glob.glob("/Test/" + "*.csv")
dfList = []
for filename in allaFiler:
df = pd.read_csv(filename)
dfList.append(df)
concatDf = pd.concat(dfList, axis=0)
print(concatDf)
[Out]:
A1 A2 A3 A4 D1 D2 D3 D4 G1 G2 G3 G4 Alfa Delta Eta
0 B1 B2 B3 B4 NaN NaN NaN NaN NaN NaN NaN NaN Beta NaN NaN
1 C1 C2 C3 C4 NaN NaN NaN NaN NaN NaN NaN NaN Gamma NaN NaN
0 NaN NaN NaN NaN E1 E2 E3 E4 NaN NaN NaN NaN NaN Epsilon NaN
1 NaN NaN NaN NaN F1 F2 F3 F4 NaN NaN NaN NaN NaN Zeta NaN
0 NaN NaN NaN NaN NaN NaN NaN NaN H1 H2 H3 H4 NaN NaN Theta
1 NaN NaN NaN NaN NaN NaN NaN NaN I1 I2 I3 I4 NaN NaN Iota
file1.csv, file2.csv, and file3.csv are the only files in the Test map. I have checked the documentation, but I have failed to find an explanation.
My questions are as follows:
- Why doesn't code 2 yield the same result as code 1? I am clearly missing something very fundamental, so please help me to understand what is going on.
- What changes do I have to do to make code 2 yield the same result as code 1?
Thanks in advance!
list('ABCD')) before you append todfList.