Pandas read csv using column names included in a list

Question

I'm quite new to Pandas. I'm trying to create a dataframe reading thousands of csv files.
The files are not structured in the same way, but I want to extract only columns I'm interested in, so I created a list which inlcudes all the column names I want, but then i have an error cause not all of them are included in each dataset.

import pandas as pd
import numpy as np
import os
import glob

# select the csv folder
csv_folder= r'myPath'

# select all xlsx files within the folder
all_files = glob.glob(csv_folder + "/*.csv")

# Set the column names to include in the dataframe
columns_to_use = ['Name1', 'Name2', 'Name3', 'Name4', 'Name5', 'Name6']

# read one by one all the excel
for filename in all_files:
    df = pd.read_csv(filename,
                     header=0,
                     usecols = columns_to_use)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-0d9670495660> in <module>
      1 for filename in all_files:
----> 2     df = pd.read_csv(filename,
      3                      header=0,
      4                     usecols = columns_to_use)
      5 

ValueError: Usecols do not match columns, columns expected but not found: ['Name1', 'Name2', 'Name4']

How could I handle this issue by including a columns if this is present in the list?

Stef · Accepted Answer · 2021-02-10 11:17:38Z

1

Usa a callable for usecols, i.e. df = pd.read_csv(filename, header=0, usecols=lambda c: c in columns_to_use). From the docs of the usecols parameter:

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.

Working example that will only read col1 and not throw an error on missing col3:

import pandas as pd
import io

s = """col1,col2
1,2"""

df = pd.read_csv(io.StringIO(s), usecols=lambda c: c in ['col1', 'col3'])

answered Feb 10, 2021 at 11:17

Stef

30.9k3 gold badges34 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas read csv using column names included in a list

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related