pandas read_csv parse header as string type but i want integer

Question

for example, csv file is as below ,(1,2,3) is header!

1,2,3
0,0,0

I read csv file using pd.read_csv and print

import pandas as pd
df = pd.read_csv('./test.csv')
print(df[1])

it occur error key error:1

it seems like that read_csv parse header as string..

is there any way using integer type in dataframe column?

jezrael · Accepted Answer · 2018-03-12 06:51:45Z

7

I think more general is cast to columns names to integer by astype:

df = pd.read_csv('./test.csv')
df.columns = df.columns.astype(int)

Another way is first get only first column and use parameter names in read_csv:

import csv
with open("file.csv", "r") as f:
    reader = csv.reader(f)
    i = np.array(next(reader)).astype(int)

#another way
#i = pd.read_csv("file.csv", nrows=0).columns.astype(int)
print (i)
[1 2 3]

df = pd.read_csv("file.csv", names=i, skiprows=1)
print (df.columns)
Int64Index([1, 2, 3], dtype='int64')

edited Mar 12, 2018 at 6:51

answered Mar 12, 2018 at 6:42

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

이승훈 Over a year ago

your general method is very good. and I facing another problem, my real dataframe column is multi index! and df.columns.level[0].astype(int) occur error TypeError: 'FrozenList' does not support mutable operations.. is there good way?

jezrael Over a year ago

I think need df.columns = [df.columns.get_level_values(0).astype(int), df.columns.get_level_values(1)]

cs95 Over a year ago

@이승훈 No, I think a better option would be using df.columns.set_levels.

jezrael Over a year ago

@이승훈 - Glad can help!

cs95 · Accepted Answer · 2018-03-12 06:59:53Z

3

Skip the header column using skiprows=1 and header=None. This automatically loads in a dataframe with integer headers starting from 0 onwards.

df = pd.read_csv('test.csv', skiprows=1, header=None).rename(columns=lambda x: x + 1)

df    
   1  2  3
0  0  0  0

The rename call is optional, but if you want your headers to start from 1, you may keep it in.

If you have a MultiIndex, use set_levels to set just the 0^th level to integer:

df.columns = df.columns.set_levels(
     df.columns.get_level_values(0).astype(int), level=0
)

edited Mar 12, 2018 at 6:59

answered Mar 12, 2018 at 6:43

cs95

406k106 gold badges744 silver badges797 bronze badges

1 Comment

piRSquared Over a year ago

set_levels is a great option.

piRSquared · Accepted Answer · 2018-03-12 07:08:28Z

2

You can use set_axis in conjunction with a lambda and pd.Index.map

Consider a csv that looks like:

1,1,2,2
a,b,a,b
1,3,5,7
0,2,4,6

Read it like:

df = pd.read_csv('test.csv', header=[0, 1])
df

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

You can pipeline the column setting with integers in the first level like:

df.set_axis(df.columns.map(lambda i: (int(i[0]), i[1])), axis=1, inplace=False)

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

answered Mar 12, 2018 at 7:08

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Armali · Accepted Answer · 2019-05-09 07:45:14Z

1

is there any way using integer type in dataframe column?

I find this quite elegant:

df = pd.read_csv('test.csv').rename(columns=int)

Note that int here is the built-in function int().

answered May 9, 2019 at 7:45

Armali

19.6k15 gold badges64 silver badges184 bronze badges

1 Comment

Brunox13 Over a year ago

Elegant, indeed!

Collectives™ on Stack Overflow

pandas read_csv parse header as string type but i want integer

4 Answers 4

4 Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related