4

I'm using the following code to read a CSV file in chunks using pandas read_csv

headers = ["1","2","3","4","5"]
fields = ["1", "5"]

for chunk in pandas.read_csv(fileName, names=headers, header=0, usecols=fields, chunksize=chunkSize):

Sometimes my CSV won't have column "5" and I want to be able to handle this case and specify some default values. Is there a way to read just the headers of my CSV file without reading the whole file so I can handle this manually? Or may be any other clever way to default the value for the missing column?

3
  • 1
    Possibly set error_bad_lines=False. Commented Aug 2, 2017 at 14:46
  • @cᴏʟᴅsᴘᴇᴇᴅ the thing is I need the value for column "5" for each row, however sometimes the whole column "5" will be missing so I have to fallback to default values. error_bad_lines=False will just ignore the row, no? Commented Aug 2, 2017 at 14:54
  • Yes, you're right. Not sure about this one. I always believed pandas would fill NaNs by default. Commented Aug 2, 2017 at 14:57

1 Answer 1

3

If you pass nrows=0 this reads just the column row, you can then call intersection to find the common column values and avoid any errors:

In[14]:
t="""1,2,3,5,6
0,1,2,3,4"""
headers = ["1","2","3","4","5"]
fields = ["1", "5"]
cols = pd.read_csv(io.StringIO(t), nrows=0).columns
cols

Out[14]: Index(['1', '2', '3', '5', '6'], dtype='object')

So now we have column names we can call intersection to find the valid columns against your expected and actual columns:

In[15]:
valid_cols = cols.intersection(headers)
valid_cols

Out[15]: Index(['1', '2', '3', '5'], dtype='object')

You can do the same with fields and then you can pass these to your current code to avoid any exceptions

Just to demonstrate that passing nrows=0 just reads the header row:

In[16]:
pd.read_csv(io.StringIO(t), nrows=0)

Out[16]: 
Empty DataFrame
Columns: [1, 2, 3, 5, 6]
Index: []
Sign up to request clarification or add additional context in comments.

2 Comments

yeah I just found about nrows but I was about to test it with nrows=1, didn't know the count starts from 0 (should've guessed) I will give it a try thanks!
Yeah it's not obvious that you can do this, will update to prove this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.