How can I convert string to numpy.array inside a DataFrame column?

Question

My DataFrame db is built from a csv file, using read_csv. Values of column A look like this:

[1,2,5,6,48,125]

On every row, the "vector" can have a different length. But it is still a string. I can strip the [ and ] as follows:

db["A"] = db["A"].str.rstrip(']').str.lstrip('[')

The resulting values, such as 1,2,5,6,48,125, should be good input for np.fromstring. However, I am not able to apply this function in combination with pandas DataFrame.

When I try: db["A"] = np.fromstring(db["A"], sep=','), it says: a bytes-like object is required, not 'Series'. Using apply also does not work. Thanks for any tips.

jezrael · Accepted Answer · 2021-06-14 08:28:55Z

2

One idea is convert values to lists and then to np.array:

import ast

db["A"] = db["A"].apply(lambda x: np.array(ast.literal_eval(x)))

answered Jun 14, 2021 at 8:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hue · Accepted Answer · 2021-06-14 08:44:26Z

0

import numpy as np
for i in range(0, len(db)-1):
  db["A"] = np.array(db.iloc[i]["A"])
  continue

answered Jun 14, 2021 at 8:44

Hue

593 bronze badges

Comments

AvidJoe · Accepted Answer · 2021-06-14 09:01:45Z

0

np.fromarray() is built for this purpose like you(OP) already pointed out. The problem here is that the input isn't being recognized as a string.

However this addresses the problem,

import pandas as pd
import numpy as np

dataframe = pd.DataFrame({'data': ["[1,2,4]", "[1,2,4,5]","[1,2,4,5,6]"]})
dataframe['data'] = dataframe['data'].apply(lambda x : np.fromstring(str(x).replace('[','').replace(']',''), sep=','))

The output will be an 1D- nparray

Running dataframe.head() gives me this

    data
0   [1.0, 2.0, 4.0]
1   [1.0, 2.0, 4.0, 5.0]
2   [1.0, 2.0, 4.0, 5.0, 6.0]

answered Jun 14, 2021 at 9:01

AvidJoe

83610 silver badges23 bronze badges

Collectives™ on Stack Overflow

How can I convert string to numpy.array inside a DataFrame column?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related