How to convert column with dtype as object to string in Pandas Dataframe [duplicate]

Question

When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?

I tried several ways but nothing worked. I used astype, str(), to_string etc.

a=lambda x: str(x).split(',')
df['column'].apply(a)

or

df['column'].astype(str)

Sometimes, string operations fail when there are unrecognized characters present. Paste your data into notepad and see if there is weird symbols where a blank space (or something else) is expected. — Dimanjan
– Dimanjan, Commented Apr 13, 2022 at 7:20
everything here is outdated, answer in duplicate question: df['id'] = df['id'].astype("string") — AJ AJ
– AJ AJ, Commented Nov 8, 2022 at 10:18
@AJAJ answer is the only one that turns object to strings, however, it turns it to string[python], dunno if that implicates something else than just "string". — chilifan
– chilifan, Commented Nov 3, 2023 at 11:00

Siraj S. · Accepted Answer · 2017-06-22 23:10:30Z

84

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters

answered Jun 22, 2017 at 23:10

Siraj S.

3,7714 gold badges37 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

VMEscoli Over a year ago

which python version are you using? it does not work for me

Jia Gao Over a year ago

got TypeError: data type "bytes256" not understood, any suggestion why?

ayorgo Over a year ago

Since pandas inherits almost the entire numpy's type system (apart from category) please refer to docs.scipy.org/doc/numpy/reference/… for more information about type shortcuts.

gies0r Over a year ago

Works in Python 3.8.2

Michal Skop Over a year ago

this ends with an error with non-latin characters for me (like á)

Community · Accepted Answer · 2017-05-23 12:18:17Z

60

Did you try assigning it back to the column?

df['column'] = df['column'].astype('str')

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:

df['column_new'] = df['column'].str.split(',')

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered Nov 27, 2015 at 12:51

Hypothetical Ninja

4,07714 gold badges52 silver badges77 bronze badges

6 Comments

Pranav Over a year ago

Yeah I tried that. The datatype of that column remained as object even after trying that.

Hypothetical Ninja Over a year ago

could you paste a sample of your dataframe?

Hypothetical Ninja Over a year ago

I have edited the answer, please check if it works

ihmpall Over a year ago

Both of them dont work :(

Keith Over a year ago

stackoverflow.com/questions/21018654/…

|

Gerard Rovira · Accepted Answer · 2018-03-30 10:58:02Z

37

Not answering the question directly, but it might help someone else.

I have a column called Volume, having both - (invalid/NaN) and numbers formatted with ,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is required for it to apply to str.replace

pandas.Series.str.replace
pandas.to_numeric

answered Mar 30, 2018 at 10:58

Gerard Rovira

33k20 gold badges120 silver badges149 bronze badges

Comments

koshmaster · Accepted Answer · 2017-08-10 15:09:42Z

6

You could try using df['column'].str. and then use any string function. Pandas documentation includes those like split

answered Aug 10, 2017 at 15:09

koshmaster

3493 silver badges4 bronze badges

2 Comments

asa Over a year ago

Nope, pandas will store the pointer to the string and the final column type will be 'object'

koshmaster Over a year ago

I believe pandas will ALWAYS store string columns as objects

Collectives™ on Stack Overflow

How to convert column with dtype as object to string in Pandas Dataframe [duplicate]

or

4 Answers 4

5 Comments

6 Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

or

4 Answers 4

5 Comments

6 Comments

Comments

2 Comments

Linked

Related