2

Overview:

I have scrapped some data off a website, put into a Pandas DataFrame but for some reason, I can't seem to convert the Data Type from an Object to an Integer or Float (for the point of this, either is fine).

I have looked through a few posts which have thankfully helped me get here, but for some reason, everything I try doesn't seem to work

A sample of the Dataset:

Condition_Type  State   Price      Year    Make         Model
In Stock        SA      $24,654    2017    Mazda        3
Used Car        VIC     $23,162    2016    Holden       Trax
Used Car        VIC     $15,777    2011    Volkswagen   Tiguan
Used Car        VIC     $12,634    2012    Volkswagen   Polo
In Stock        VIC     $70,501    2017    Volkswagen   Amarok

What I have attempted so far:

df["Price"] = df["Price"].str.replace("$","").astype(int)

ValueError: invalid literal for int() with base 10:

df["Price"] = df["Price"].astype(str).astype(int)

ValueError: invalid literal for int() with base 10:

pd.Series(df["Price"]).convert_objects(convert_numeric=True)

FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

pd.to_numeric(df["Price"], errors='coerce')

Returns NaN

pd.to_numeric(df["Price"], errors='ignore')

Values stay as objects

df["Price"] = df["Price"].astype(np.int64, inplace=True)

ValueError: invalid literal for int() with base 10:

This last one has worked in the past, but for some reason, it isn't working on this data-set.

Any ideas?

Thanks, Adrian

1 Answer 1

1

I think you need escape value $ first and then replace with , to empty string with Series.replace:

df["Price"] = df["Price"].replace(["\$", ','],"", regex=True).astype(int)
print (df)
  Condition_Type State  Price  Year        Make   Model
0       In Stock    SA  24654  2017       Mazda       3
1       Used Car   VIC  23162  2016      Holden    Trax
2       Used Car   VIC  15777  2011  Volkswagen  Tiguan
3       Used Car   VIC  12634  2012  Volkswagen    Polo
4       In Stock   VIC  70501  2017  Volkswagen  Amarok

print (df['Price'].dtypes)
int32
Sign up to request clarification or add additional context in comments.

5 Comments

This worked (thank you!) - can you explain why? You're replacing the $ sign with a comma - what is the "" doing in the code? I haven't done much work with Regular Expressions, but keen to learn more. I want to understand the why rather than you just give me the answer (which I appreciate a lot)
No, I dont replace by comma. I replace $ and , to empty string.
Hard question how better understand regex. It is really huge area. But here need only escape $ - it means in regex end of string
That makes sense - I've always used replace with one argument; not two. That's pretty cool! And regex = True is just finding a match for the '$' and ','? Thanks for helping me!
regex=True means find substrings define in [].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.