0

My DataFrame db is built from a csv file, using read_csv. Values of column A look like this:

[1,2,5,6,48,125]

On every row, the "vector" can have a different length. But it is still a string. I can strip the [ and ] as follows:

db["A"] = db["A"].str.rstrip(']').str.lstrip('[')

The resulting values, such as 1,2,5,6,48,125, should be good input for np.fromstring. However, I am not able to apply this function in combination with pandas DataFrame.

When I try: db["A"] = np.fromstring(db["A"], sep=','), it says: a bytes-like object is required, not 'Series'. Using apply also does not work. Thanks for any tips.

3 Answers 3

2

One idea is convert values to lists and then to np.array:

import ast

db["A"] = db["A"].apply(lambda x: np.array(ast.literal_eval(x)))
Sign up to request clarification or add additional context in comments.

Comments

0
import numpy as np
for i in range(0, len(db)-1):
  db["A"] = np.array(db.iloc[i]["A"])
  continue

Comments

0

np.fromarray() is built for this purpose like you(OP) already pointed out. The problem here is that the input isn't being recognized as a string.

However this addresses the problem,

import pandas as pd
import numpy as np

dataframe = pd.DataFrame({'data': ["[1,2,4]", "[1,2,4,5]","[1,2,4,5,6]"]})
dataframe['data'] = dataframe['data'].apply(lambda x : np.fromstring(str(x).replace('[','').replace(']',''), sep=','))

The output will be an 1D- nparray

Running dataframe.head() gives me this

    data
0   [1.0, 2.0, 4.0]
1   [1.0, 2.0, 4.0, 5.0]
2   [1.0, 2.0, 4.0, 5.0, 6.0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.