18

I have a dataframe created form a JSON output that looks like this:

        Total Revenue    Average Revenue    Purchase count    Rate
Date    
Monday  1,304.40 CA$     20.07 CA$          2,345             1.54 %

The value stored are received as string from the JSON. I am trying to:

1) Remove all characters in the entry (ex: CA$ or %) 2) convert rate and revenue columns to float 3) Convert count columns as int

I tried to do the following:

df[column] = (df[column].str.split()).apply(lambda x: float(x[0]))

It works fine except when I have a value with a coma (ex: 1,465 won't work whereas 143 would).

I tried to use several function to replace the "," by "", etc. Nothing worked so far. I always receive the following error:

ValueError: could not convert string to float: '1,304.40'

4 Answers 4

19

These strings have commas as thousands separators so you will have to remove them before the call to float:

df[column] = (df[column].str.split()).apply(lambda x: float(x[0].replace(',', '')))

This can be simplified a bit by moving split inside the lambda:

df[column] = df[column].apply(lambda x: float(x.split()[0].replace(',', '')))
Sign up to request clarification or add additional context in comments.

Comments

4

Another solution with list comprehension, if need apply string functions working only with Series (columns of DataFrame) like str.split and str.replace:

df = pd.concat([df[col].str.split()
                       .str[0]
                       .str.replace(',','').astype(float) for col in df], axis=1)

#if need convert column Purchase count to int
df['Purchase count'] = df['Purchase count'].astype(int)
print (df)
         Total Revenue  Average Revenue  Purchase count  Rate
Date                                                        
Monday         1304.4            20.07            2345  1.54

Comments

1

I have also faced that problem and my problem was resolved using the code below:

import pandas as pd
df['Purchase count'] = pd.to_numeric(df['Purchase count'], errors='coerce')
print(df.dtypes)

Comments

1

the below solution worked for me..!!

import pandas as pd


df['Purchase count'] = df['Purchase count'].replace(',', '', regex=True).astype(float)

print('type:    ', type(df['Purchase count']))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.