I am trying to read 3 years of data files (one for each date), and the portion I am interested is often quite small (total ~1.4 million rows), compared to the parent files (each about 90MB and 1.5 million rows). The below code has worked pretty good for me in the past with smaller number of files. But with 1095 files to process, it is crawling (taking about 3-4 seconds to read one file). Any suggestions for making this more efficient/fast?
import pandas as pd
from glob import glob
file_list = glob(r'C:\Temp2\dl*.csv')
for file in file_list:
print(file)
df = pd.read_csv(file, header=None)
df = df[[0,1,3,4,5]]
df2 = df[df[0].isin(det_list)]
if file_list[0]==file:
rawdf = df2
else:
rawdf = rawdf.append(df2)
dtypesof the columns