1

I have a dataframe with columns values that are np.arrays. For example

df = pd.DataFrame([{"id":1, "sample": np.array([1,2,3])}, {"id":2, "sample": np.array([2,3,4])}])
df.to_csv("./tmp.csv", index=False)

if I save df to csv and load it again I get "sample" column as strings.

df_from_csv = pd.read_csv("./tmp.csv")   
df_from_csv == pd.DataFrame([{"id":1, "sample": '[1 2 3]')}, {"id":2, "sample": '[2 3 4]')}])
True

Is there a better way to save/load my data that does no requiere manually passing '[1 2 3]' to ist corresponding array?

2
  • This is not a reliable way of saving such a dataframe. As you found it writes the str display of each array element. csv is inherently a 2d format, so can't handle the implied third dimension of these arrays. If the arrays are large enough, that str will be condensed, with '...'. Such an array cannot be recovered. Commented Nov 17, 2022 at 16:32
  • @hpaulj any suggestion on how to save it? Commented Nov 17, 2022 at 18:02

1 Answer 1

1

You can use a converter in read_csv:

import numpy as np
from ast import literal_eval
import re

def to_array(x):
    return np.array(literal_eval(re.sub('\s+', ',', x)))

df_from_csv = pd.read_csv("./tmp.csv", converters={'sample': to_array}) 

#    id     sample
# 0   1  [1, 2, 3]
# 1   2  [2, 3, 4]

df_from_csv.loc[0, 'sample']

# array([1, 2, 3])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.