3

I have the following sample .csv file:

str_header  int_header
string_a       1
string_b       2
string_c       3

According to solutions on the internet, this code:

import pandas as pd
data = pd.read_csv("z.csv", names=['int_header'])
print(data['int_header'])

should only read int_header column into data. But data, when printed as above, actually contains all of the file columns. I'm using Anaconda distribution of Python. What's wrong?

1 Answer 1

6

try this instead:

data = pd.read_csv("z.csv", usecols=['int_header'])

assuming that your CSV file has , as a delimiter

Explanation:

Docs:

names : array-like, default None

List of column names to use. If file contains no header row, then you should explicitly pass header=None

usecols : array-like, default None

Return a subset of the columns. Results > in much faster parsing time and lower memory usage.

documentation is a bit confusing.

names - used for naming (giving columns names), especially if you don't have a header line or want to ignore/skip it.

usecols - used for choosing only "interesting" columns

Sign up to request clarification or add additional context in comments.

3 Comments

The docs confused me: names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None So I thought this is the one to use...
The "names" argument is for the internal names used by pandas not the actual columns in the csv file.
@student1, i've added some explanation to my answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.