193

When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?

I tried several ways but nothing worked. I used astype, str(), to_string etc.

a=lambda x: str(x).split(',')
df['column'].apply(a)

or

df['column'].astype(str)
3
  • Sometimes, string operations fail when there are unrecognized characters present. Paste your data into notepad and see if there is weird symbols where a blank space (or something else) is expected. Commented Apr 13, 2022 at 7:20
  • 21
    everything here is outdated, answer in duplicate question: df['id'] = df['id'].astype("string") Commented Nov 8, 2022 at 10:18
  • 1
    @AJAJ answer is the only one that turns object to strings, however, it turns it to string[python], dunno if that implicates something else than just "string". Commented Nov 3, 2023 at 11:00

4 Answers 4

84

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Sign up to request clarification or add additional context in comments.

5 Comments

which python version are you using? it does not work for me
got TypeError: data type "bytes256" not understood, any suggestion why?
Since pandas inherits almost the entire numpy's type system (apart from category) please refer to docs.scipy.org/doc/numpy/reference/… for more information about type shortcuts.
Works in Python 3.8.2
this ends with an error with non-latin characters for me (like á)
60

Did you try assigning it back to the column?

df['column'] = df['column'].astype('str') 

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:

df['column_new'] = df['column'].str.split(',') 

6 Comments

Yeah I tried that. The datatype of that column remained as object even after trying that.
could you paste a sample of your dataframe?
I have edited the answer, please check if it works
Both of them dont work :(
|
37

Not answering the question directly, but it might help someone else.

I have a column called Volume, having both - (invalid/NaN) and numbers formatted with ,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is required for it to apply to str.replace

pandas.Series.str.replace
pandas.to_numeric

Comments

6

You could try using df['column'].str. and then use any string function. Pandas documentation includes those like split

2 Comments

Nope, pandas will store the pointer to the string and the final column type will be 'object'
I believe pandas will ALWAYS store string columns as objects

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.