7

for example, csv file is as below ,(1,2,3) is header!

1,2,3
0,0,0

I read csv file using pd.read_csv and print

import pandas as pd
df = pd.read_csv('./test.csv')
print(df[1])

it occur error key error:1

it seems like that read_csv parse header as string..

is there any way using integer type in dataframe column?

4 Answers 4

7

I think more general is cast to columns names to integer by astype:

df = pd.read_csv('./test.csv')
df.columns = df.columns.astype(int)

Another way is first get only first column and use parameter names in read_csv:

import csv
with open("file.csv", "r") as f:
    reader = csv.reader(f)
    i = np.array(next(reader)).astype(int)

#another way
#i = pd.read_csv("file.csv", nrows=0).columns.astype(int)
print (i)
[1 2 3]

df = pd.read_csv("file.csv", names=i, skiprows=1)
print (df.columns)
Int64Index([1, 2, 3], dtype='int64')
Sign up to request clarification or add additional context in comments.

4 Comments

your general method is very good. and I facing another problem, my real dataframe column is multi index! and df.columns.level[0].astype(int) occur error TypeError: 'FrozenList' does not support mutable operations.. is there good way?
I think need df.columns = [df.columns.get_level_values(0).astype(int), df.columns.get_level_values(1)]
@이승훈 No, I think a better option would be using df.columns.set_levels.
@이승훈 - Glad can help!
3

Skip the header column using skiprows=1 and header=None. This automatically loads in a dataframe with integer headers starting from 0 onwards.

df = pd.read_csv('test.csv', skiprows=1, header=None).rename(columns=lambda x: x + 1)

df    
   1  2  3
0  0  0  0

The rename call is optional, but if you want your headers to start from 1, you may keep it in.


If you have a MultiIndex, use set_levels to set just the 0th level to integer:

df.columns = df.columns.set_levels(
     df.columns.get_level_values(0).astype(int), level=0
)

1 Comment

set_levels is a great option.
2

You can use set_axis in conjunction with a lambda and pd.Index.map

Consider a csv that looks like:

1,1,2,2
a,b,a,b
1,3,5,7
0,2,4,6

Read it like:

df = pd.read_csv('test.csv', header=[0, 1])
df

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

You can pipeline the column setting with integers in the first level like:

df.set_axis(df.columns.map(lambda i: (int(i[0]), i[1])), axis=1, inplace=False)

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

Comments

1

is there any way using integer type in dataframe column?

I find this quite elegant:

df = pd.read_csv('test.csv').rename(columns=int)

Note that int here is the built-in function int().

1 Comment

Elegant, indeed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.