I'm quite new to Pandas.
I'm trying to create a dataframe reading thousands of csv files.
The files are not structured in the same way, but I want to extract only columns I'm interested in, so I created a list which inlcudes all the column names I want, but then i have an error cause not all of them are included in each dataset.
import pandas as pd
import numpy as np
import os
import glob
# select the csv folder
csv_folder= r'myPath'
# select all xlsx files within the folder
all_files = glob.glob(csv_folder + "/*.csv")
# Set the column names to include in the dataframe
columns_to_use = ['Name1', 'Name2', 'Name3', 'Name4', 'Name5', 'Name6']
# read one by one all the excel
for filename in all_files:
df = pd.read_csv(filename,
header=0,
usecols = columns_to_use)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-0d9670495660> in <module>
1 for filename in all_files:
----> 2 df = pd.read_csv(filename,
3 header=0,
4 usecols = columns_to_use)
5
ValueError: Usecols do not match columns, columns expected but not found: ['Name1', 'Name2', 'Name4']
How could I handle this issue by including a columns if this is present in the list?