Pandas write to string to csv instead of an array

Question

I would like to store a pandas DataFrame into a CSV file. The DataFrame has two columns: the first column has strings while the second one stores several arrays.

The problem here is that instead of storing an string and an array per row, the CSV file has two strings per row in the following way:

0004d4463b50_01.jpg,"[ 611461      44  613328 ...,       5 1767504      19]"

An example of my code can be found here:

rle = []

# run test loop with a progress bar
for i, (images, _) in enumerate(loader): 
    # do some stuff here
    # 'rle_local' is a ndarray with more than a thousand elemnts
    rle.append(rle_local)

# 'names' contain the strings
df = pd.DataFrame({'strings': names, 'arrays': rle})
df.to_csv(file_path, index=False, compression='gzip')

Any ideas on what is wrong here and why it stores strings instead of the bunch of numbers that the arrays contain?

Thanks in advance!

Desired output would be 00087a6bd4dc_01.jpg,879386 40 881253 141 883140 205 885009 17 885032 259 886923 308 888839 328 890754 340 892670 347 894587 352 896503 357 898420 360 900336 364 902253 367 904170 370 906086 374 ... First the string and then all the numbers that are contained in the array. — Manuel Lagunas
– Manuel Lagunas, Commented Sep 7, 2017 at 14:06
I do not think I am able to recover the array by parsing the string since it stores ... instead of the content — Manuel Lagunas
– Manuel Lagunas, Commented Sep 7, 2017 at 14:08
I am using pandas 0.20.3 and python 3.6. I double checked, rle is a python list while its content type is ndarray. Seems like it is storing in the file the ndarray __str__ method (like if would do print (rle[0])) — Manuel Lagunas
– Manuel Lagunas, Commented Sep 7, 2017 at 14:19
You're right, it only applies to numpy arrays. If you convert them to lists it should work. — IanS
– IanS, Commented Sep 7, 2017 at 14:23

IanS · Accepted Answer · 2017-09-07 14:42:33Z

1

A solution would be to serialize the arrays in the dataframe.

# overwrites original arrays!
df['arrays'] = df['arrays'].apply(lambda a: ' '.join(map(str, a)))

Quick example:

s = pd.Series([np.arange(100, 200), np.arange(200, 300)])
s.apply(lambda a: ' '.join(map(str, a))).to_csv()

answered Sep 7, 2017 at 14:42

IanS

16.3k9 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas write to string to csv instead of an array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related