4

I'm reading a list from pandas dataframe cell.

>>from pandas import DataFrame as table
>>x = table.loc[table['person'] == int(123), table.columns != 'xyz']['segment'][0]
>>print("X = ",x)

where 'person' and 'segment' are my column names and segment contains a list with floating values.

>>X = [[39.414, 39.498000000000005]]

Now, when I try to convert this into a numpy array,

>>x = numpy.asarray(x)
>>x=x.astype(float)

I get the following error

ValueError: could not convert string to float: '[[39.414, 39.498000000000005]]'

I have tried parsing the string and tried to remove any "\n" or " " or any unnecessary quotes, but it does not work. Then I tried to find the dtype

>>print("Dtype = ", x.dtype)
>>Dtype = <U30

I assume that we need to convert the U30 dtype into floats, but I am not sure how to do it. I am using numpy version 1.15.0.

All I want to do is, to parse the above list into a list with floating point values.

2
  • can you add some mock data so this is reproducible? Commented Sep 11, 2018 at 16:20
  • Looks like you have a string representation of a list. Try using ast.literal_eval(x) first. Do it on the entire column to make this easier: df.segment = df.segment.apply(ast.literal_eval) Commented Sep 11, 2018 at 16:22

2 Answers 2

3

The datatype should have tipped you off. U30 here stands for a length 30 unicode string (Which is what you'll see if you type len(x).

What you have is the string representation of a list, not a list of strings/floats/etc..

You need to use the ast library here:

x = '[[39.414, 39.498000000000005]]'
x = ast.literal_eval(x)
np.array(x, dtype=float)

array([[39.414, 39.498]])
Sign up to request clarification or add additional context in comments.

Comments

2

For the specific format you see, consider np.fromstring. With string slicing you can also remove the unused dimension:

x = '[[39.414, 39.498000000000005]]'

res = np.fromstring(x[2:-2], sep=',')

# array([ 39.414,  39.498])

4 Comments

Hi! This works for the above example, but does not work for muti dimensional arrays like this : x = '[[39.414, 39.498000000000005],[344.234234,442.23432]]'. In that case x = ast.literal_eval(x) followed by np.array(x, dtype=float) is more suitable
@appsdownload, Yes, hence the comment For the specific format you see.
Oh. Thank you @jpp. Also can you help me understand which one would be more efficient if I have a specific format?
I'm not sure. That's a separate question, but look up the timeit module and you can test for yourself!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.