numpy array changes to string when writing to file

Question

I have a dataframe where one of the columns is a numpy array:

 DF

      Name                     Vec
 0  Abenakiite-(Ce) [0.0, 0.0, 0.0, 0.0, 0.0, 0.043, 0.0, 0.478, 0...
 1  Abernathyite    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
 2  Abhurite        [0.176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.235, 0...
 3  Abswurmbachite  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0,...

When I check the data type of each element, the correct data type is returned.

 type(DF['Vec'].iloc[1])
 numpy.ndarray

I save this into a csv file:

DF.to_csv('.\\file.csv',sep='\t')

Now, when I read the file again,

new_DF=pd.read_csv('.\\file.csv',sep='\t')

and check the datatype of Vec at index 1:

type(new_DF['Vec'].iloc[1])   
str

The size of the numpy array is 1x127.

The data type has changed from a numpy array to a string. I can also see some new line elements in the individual vectors. I think this might be due to some problem when the vector is written into a csv but I don't know how to fix it. Can someone please help?

Thanks!

The information about data types is not saved into a CSV file. There is no way for Pandas CSV reader to know that what you attempt to read used to be a NumPy array in the past life. You should either save the array separately as a .npy file or transform the string back into an array yourself. — DYZ
– DYZ, Commented Jun 19, 2018 at 17:54
You should use dtype in read_csv. It is mentioned in the documentation — anishtain4
– anishtain4, Commented Jun 19, 2018 at 17:57
What else do you expect. csv is a text file? The string format of an array, e.g. '[0 1 2]' is the only way it can write the 2nd column. It can't write some sort of binary representation of the array (except maybe using pickle.dumps). Look at the csv file (with any text viewer). — hpaulj
– hpaulj, Commented Jun 19, 2018 at 18:11
I changed the read_csv command to: new_DF=pd.read_csv('.\\file.csv',sep='\t',dtype={'Vec':np.ndarray}) However, the new error is : dtype <class 'numpy.ndarray'> not understood — vineeth venugopal
– vineeth venugopal, Commented Jun 19, 2018 at 18:15
dtype refers to the elements of an array, not the type of the array as a whole. I don't think read_csv can handle this type of input. It may be possible, though to process those strings after they are in the dataframe. — hpaulj
– hpaulj, Commented Jun 19, 2018 at 18:38

anishtain4 · Accepted Answer · 2018-06-19 19:02:33Z

7

In the comments I made a mistake and said dtype instead of converters. What you want is to convert them as you read them using a function. With some dummy variables:

df=pd.DataFrame({'name':['name1','name2'],'Vec':[np.array([1,2]),np.array([3,4])]})
df.to_csv('tmp.csv')
def converter(instr):
    return np.fromstring(instr[1:-1],sep=' ')
df1=pd.read_csv('tmp.csv',converters={'Vec':converter})
df1.iloc[0,2]
array([1., 2.])

answered Jun 19, 2018 at 19:02

anishtain4

2,4102 gold badges18 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

vineeth venugopal Over a year ago

Thank you! This totally worked. What is the last line: df1.iloc[0,2]. It returns 'name1'

anishtain4 Over a year ago

it was just to show that Vec column is converted to an array.

Florida Man Over a year ago

Hi, could you take a look at my very similar problem? I followed your answer but only received empty [] fields. Thanks stackoverflow.com/questions/60960170/…

coffeenino · Accepted Answer · 2023-12-27 10:41:56Z

0

The answer above works. If you get empty lists, add the list slicing [1:-1] !

This converts the string [-2.0797753, 3.6340227, -1.7011836]

to -2.0797753, 3.6340227, -1.7011836

which is the required format for np.fromstring https://numpy.org/doc/stable/reference/generated/numpy.fromstring.html

answered Dec 27, 2023 at 10:41

coffeenino

113 bronze badges

1 Comment

robbo Over a year ago

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Collectives™ on Stack Overflow

numpy array changes to string when writing to file

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related