0

I have a simple pandas dataframe with a column:

col = [['A']]
data = [[1.0],[2.3],[3.4]]
df = pd.DataFrame.from_records(data, columns=col)

This creates a dataframe with one column of type np.float64, which is what I want.

Later in the process, I want to add another column of type string.

df['SOMETEXT'] = "SOME TEXT FOR ANALYSIS"

The dtype of this column is coming though as dtype of object, but I need it to be type string. So I do the following:

df['SOMETEXT'] = df['SOMETEXT'].astype(str)

If I look at the dtype again, I get the same dtype for that column: object.

I have a process down my workflow that is dtype sensitive and I need the column to be a string.

Any ideas?

array = df.to_records(index=False) # convert to numpy array

The dtypes on the array still carry the object dtype, but the columns should be a string.

2

1 Answer 1

3

In pandas, all strings are object type. It confused me too when I first started.

Once in NumPy, you can cast the string:

In [24]: array['SOMETEXT'].astype(str)
Out[24]: 
array(['SOME TEXT FOR ANALYSIS', 'SOME TEXT FOR ANALYSIS',
       'SOME TEXT FOR ANALYSIS'], 
      dtype='<U22')
Sign up to request clarification or add additional context in comments.

4 Comments

ya, but when you export the DF to an numpy array, it carries the object type and doesn't convert the data to a string.
i forgot a piece of information, please see the updated question.
That's just the internal implementation. Have a look at one of the answers of the potential duplicate for an explanation.
It won't work, as this is not intended. There is an explanation why this is of type object for strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.