Create dataframe from specific column

Question

I am trying to create a dataframe in Pandas from the AB column in my csv file. (AB is the 27th column).

I am using this line:

df = pd.read_csv(filename, error_bad_lines = False, usecols = [27])

... which is resulting in this error:

ValueError: Usecols do not match names.

I'm very new to Pandas, could someone point out what i'm doing wrong to me?

A is the first column (index = 0) Z is the 26th, AA, AB should be the 28th (index = 27). — Mohammad Athar
– Mohammad Athar, Commented Sep 7, 2016 at 16:25
You can also write usecols=['AB'] to avoid all that confusion. — Nickil Maveli
– Nickil Maveli, Commented Sep 7, 2016 at 16:26
@user2539738 Wasn't sure if Pandas started with 0 for usecols. Anyway, the error persists. — Harrison
– Harrison, Commented Sep 7, 2016 at 16:26
@NickilMaveli When I switch my line to df = pd.read_csv(filename, error_bad_lines = False, usecols = ['AB']) the error is still the same. — Harrison
– Harrison, Commented Sep 7, 2016 at 16:27
Can you provide us with the first five or so lines in your file? — Mad Physicist
– Mad Physicist, Commented Sep 7, 2016 at 16:28

MaxU - stand with Ukraine · Accepted Answer · 2016-09-07 18:02:07Z

Here is a small demo:

CSV file (without header, i.e. there is NO column names):

1,2,3,4,5,6,7,8,9,10
11,12,13,14,15,16,17,18,19,20

We are going to read only 8-th column:

In [1]: fn = r'D:\temp\.data\1.csv'

In [2]: df = pd.read_csv(fn, header=None, usecols=[7], names=['col8'])

In [3]: df
Out[3]:
   col8
0     8
1    18

PS pay attention at header=None, usecols=[7], names=['col8']

If you don't use header=None and names parameters, the first row will be used as a header:

In [6]: df = pd.read_csv(fn, usecols=[7])

In [7]: df
Out[7]:
    8
0  18

In [8]: df.columns
Out[8]: Index(['8'], dtype='object')

and if we want to read only the last 10-th column:

In [9]: df = pd.read_csv(fn, usecols=[10])
... skipped ...
ValueError: Usecols do not match names.

because pandas counts columns starting from 0, so we have to do it this way:

In [12]: df = pd.read_csv(fn, usecols=[9], names=['col10'])

In [13]: df
Out[13]:
   col10
0     10
1     20

DENDULURI CHAITANYA · Accepted Answer · 2016-09-07 16:29:13Z

-1

usecols uses the column name in your csv file rather than the column number. in your case it should be usecols=['AB'] rather than usecols=[28] that is the reason of your error stating usecols do not match names.

answered Sep 7, 2016 at 16:29

DENDULURI CHAITANYA

3194 silver badges17 bronze badges

1 Comment

MaxU - stand with Ukraine Over a year ago

usecols supports both positional column indexes or column names. From docs:

All elements in this array must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s)

Collectives™ on Stack Overflow

Create dataframe from specific column

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related