2

I imported a csv file with a column ['Price'] which dtype is object.

I would like to make a histogram of the price distribution.

Yet, i do not know how to convert the dtype from 'object' into 'float'.

brandprice=product['Price'].values
brandprice

array(['2,143,562', '2,186,437', '2,214,903', ..., '-', '-', '-'], dtype=object)

map(float, brandprice) 

ValueError Traceback (most recent call last) in () ----> 1 map(float, brandprice) ValueError: invalid literal for float(): 2,143,562

1
  • 1
    Well, 2,143,562 is invalid for a float. You'd need to strip the commas out of that to get a valid number. Does 2,143,562 really represent 2143562? Commented Dec 1, 2015 at 9:38

2 Answers 2

2

This actually doesn't have anything to do with using an array at all, it's just that float doesn't deal well with anything but digits and the . symbol. So your commas are throwing off the function because it doesn't know what to make of them.

If you call replace(',', '') to remove the commas, then it would parse fine:

>>> float("2,143,562")

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    float("2,143,562")
ValueError: invalid literal for float(): 2,143,562
>>> float("2,143,562".replace(',', ''))
2143562.0

Since you need to do it to a full list, I suggest using map with a short function that you write yourself. Something like this:

def make_float(string):
    try:
        return float(string.replace(',', ''))
    except ValueError:
        return string

map(make_float, brandprice)

This will strip commas from the string and then attempt to turn it into a float. If errors arise the original string is returned unchanged (as in your sample data you showed some strings like '-' which wont be parsed.

Sign up to request clarification or add additional context in comments.

1 Comment

@stephdata replace only applies to one string at a time so I instead suggested a more robust function to use in map. See my edit.
0

As per your given list you have 2 invalid characters"," and "-", which would raise error while converting to float type so I would suggest you a flexible way of doing the same in which you may add more invalid characters.

import re
# You may try to typecast your numpy array as list object using `.tolist()`

a = ['2,143,562', '2,186,437', '2,214,903', '-', '-', '-']
rx = re.compile(',|-') #creating a regular expression including the invalid characters.

a_filtered = [rx.sub(r'', i) if rx.sub(r'', i) else 0 for i in a]
print map(float, a_filtered)
>>> [2143562.0, 2186437.0, 2214903.0, 0.0, 0.0, 0.0]

First you construct a simple regex with all the invalid characters in you list, next step is to replace all the invalid characters with blank character, and also checking the fact that, replacing them with blank characters doesnot yields an empty string(if else condition for this).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.