Import pandas dataframe column as string not int

Question

I would like to import the following csv as strings not as int64. Pandas read_csv automatically converts it to int64, but I need this column as string.

ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166

df = read_csv('sample.csv')

df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Unfortunately using converters gives the same result.

df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

ihightower · Accepted Answer · 2020-12-02 04:12:02Z

234

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]: 
                           ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

edited Dec 2, 2020 at 4:12

ihightower

3,3036 gold badges39 silver badges51 bronze badges

answered Nov 14, 2012 at 17:58

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

steveb Over a year ago

It also seems, if you want all columns to be interpreted as strings, one can do the following: dtype = str.

Josiah Yoder Over a year ago

It seems empty fields still come through as np.nan

jtcloud Over a year ago

same question here. But i used keep_default_na = False resolved my issue.

Ross117 Over a year ago

Thank you for the comments. I also had to use dypte=str AND keep_default_na = False so that null values weren't nan.

Soy César Mora Over a year ago

Using the high-digit integers as a string saves a lot of headaches. ; hero or villain? YOU'RE A HERO!!

Itchydon · Accepted Answer · 2020-06-03 09:47:20Z

23

Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').

edited Jun 3, 2020 at 9:47

Itchydon

2,6026 gold badges24 silver badges37 bronze badges

answered Apr 14, 2020 at 3:03

denis_smyslov

9079 silver badges8 bronze badges

Comments

spencerlyon2 · Accepted Answer · 2012-11-10 01:06:15Z

22

This probably isn't the most elegant way to do it, but it gets the job done.

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])

In[4]: df
Out[4]: 
                       ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file

edited Nov 10, 2012 at 1:06

answered Nov 9, 2012 at 2:54

spencerlyon2

9,7364 gold badges33 silver badges41 bronze badges

Comments

Alex · Accepted Answer · 2024-01-10 14:52:23Z

0

The following approach seems to work to get every column as a string:

import pandas as pd
from collections import defaultdict

df = pd.read_csv(
    data_path,
    dtype=defaultdict(lambda: 'string'),
    keep_default_na=False,
)

answered Jan 10, 2024 at 14:52

Alex

4022 silver badges12 bronze badges

Collectives™ on Stack Overflow

Import pandas dataframe column as string not int

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related