18

Let us consider the following pandas dataframe:

df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'})

enter image description here

where the B column is composed by two numpy arrays.

If we save the dataframe and the load it again, the numpy array is converted into a string.

df.to_csv('test.csv', index = False)
df.read_csv('test.csv')

Is there any simple way of solve this problem? Here is the output of the loaded dataframe.

enter image description here

1

2 Answers 2

21

you can pickle the data instead.

df.to_pickle('test.csv')
df = pd.read_pickle('test.csv')

This will ensure that the format remains the same. However, it is not human readable

If human readability is an issue, I would recommend converting it to a json file

df.to_json('abc.json')
df = pd.read_json('abc.json')
Sign up to request clarification or add additional context in comments.

1 Comment

Just be careful. Pickling for your own use is fine. But it can be incompatible across pandas versions.
0

Use the following function to format each row.

def formatting(string_numpy):
"""formatting : Conversion of String List to List

Args:
    string_numpy (str)
Returns:
    l (list): list of values
"""
list_values = string_numpy.split(", ")
list_values[0] = list_values[0][2:]
list_values[-1] = list_values[-1][:-2]
return list_values

Then use the following apply function to convert it back into numpy arrays.

df[col] = df.col.apply(formatting)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.