1

I would like to store a pandas DataFrame into a CSV file. The DataFrame has two columns: the first column has strings while the second one stores several arrays.

The problem here is that instead of storing an string and an array per row, the CSV file has two strings per row in the following way:

0004d4463b50_01.jpg,"[ 611461      44  613328 ...,       5 1767504      19]"

An example of my code can be found here:

rle = []

# run test loop with a progress bar
for i, (images, _) in enumerate(loader): 
    # do some stuff here
    # 'rle_local' is a ndarray with more than a thousand elemnts
    rle.append(rle_local)

# 'names' contain the strings
df = pd.DataFrame({'strings': names, 'arrays': rle})
df.to_csv(file_path, index=False, compression='gzip')   

Any ideas on what is wrong here and why it stores strings instead of the bunch of numbers that the arrays contain?

Thanks in advance!

6
  • Desired output would be 00087a6bd4dc_01.jpg,879386 40 881253 141 883140 205 885009 17 885032 259 886923 308 888839 328 890754 340 892670 347 894587 352 896503 357 898420 360 900336 364 902253 367 904170 370 906086 374 ... First the string and then all the numbers that are contained in the array. Commented Sep 7, 2017 at 14:06
  • I do not think I am able to recover the array by parsing the string since it stores ... instead of the content Commented Sep 7, 2017 at 14:08
  • Oh I see, I thought the ... were added by you! Commented Sep 7, 2017 at 14:10
  • I am using pandas 0.20.3 and python 3.6. I double checked, rle is a python list while its content type is ndarray. Seems like it is storing in the file the ndarray __str__ method (like if would do print (rle[0])) Commented Sep 7, 2017 at 14:19
  • You're right, it only applies to numpy arrays. If you convert them to lists it should work. Commented Sep 7, 2017 at 14:23

1 Answer 1

1

A solution would be to serialize the arrays in the dataframe.

# overwrites original arrays!
df['arrays'] = df['arrays'].apply(lambda a: ' '.join(map(str, a)))

Quick example:

s = pd.Series([np.arange(100, 200), np.arange(200, 300)])
s.apply(lambda a: ' '.join(map(str, a))).to_csv()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.