I have a dataframe where one of the columns is a numpy array:
DF
Name Vec
0 Abenakiite-(Ce) [0.0, 0.0, 0.0, 0.0, 0.0, 0.043, 0.0, 0.478, 0...
1 Abernathyite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2 Abhurite [0.176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.235, 0...
3 Abswurmbachite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0,...
When I check the data type of each element, the correct data type is returned.
type(DF['Vec'].iloc[1])
numpy.ndarray
I save this into a csv file:
DF.to_csv('.\\file.csv',sep='\t')
Now, when I read the file again,
new_DF=pd.read_csv('.\\file.csv',sep='\t')
and check the datatype of Vec at index 1:
type(new_DF['Vec'].iloc[1])
str
The size of the numpy array is 1x127.
The data type has changed from a numpy array to a string. I can also see some new line elements in the individual vectors. I think this might be due to some problem when the vector is written into a csv but I don't know how to fix it. Can someone please help?
Thanks!
dtypeinread_csv. It is mentioned in the documentation'[0 1 2]'is the only way it can write the 2nd column. It can't write some sort of binary representation of the array (except maybe usingpickle.dumps). Look at thecsvfile (with any text viewer).dtyperefers to the elements of an array, not thetypeof the array as a whole. I don't thinkread_csvcan handle this type of input. It may be possible, though to process those strings after they are in the dataframe.