2

I want to add a new column called '2016 Salary ($)' that contains employee pay from a table Salary Paid as a number, to the DataFrame income. I want to strip that number by removing '$'and ','.

But when I am doing this I get the error saying:

'Could not convert string to float'

I try to follow the hint, but it is not working:

income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])
income
3
  • Please show part of the data, change the code to code block using CTRL+K and the error you got. Commented Sep 8, 2019 at 6:01
  • There is column called name - jack, Salary =$204,546,289.35 Year= 2016. I want ato add a column that just get a number from salary and placed in the income['2016 Salary ($)'] = 204546289.35. When i try to write the code it says cannot convert string to float Commented Sep 8, 2019 at 6:10
  • income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$,').astype(float) Commented Sep 8, 2019 at 6:20

4 Answers 4

2

Add Series.str.replace first:

income['2016 Salary ($)']= income['SalaryPaid'].str.replace(',', '')
                                               .str.strip('$')
                                               .astype(float)

Or better solution if create DataFrame from file is use thousands parameter in read_csv:

income = pd.read_csv(file, thousands=',')

income['2016 Salary ($)']= income['SalaryPaid'].str.strip('$').astype(float)
Sign up to request clarification or add additional context in comments.

Comments

2

Try something like this :

Data :

dic = {'Name':['John','Peter'],'SalaryPaid':['$204,546,289.35','$500,231,289.35'],'Year':['2008','2009']}
df1 = pd.DataFrame(dic)
df1

    Name    SalaryPaid      Year
0   John    $204,546,289.35 2008
1   Peter   $500,231,289.35 2009

Code:

df1['SalaryPaid'] = df1['SalaryPaid'].str.replace(',', '')
# If you want the result as a string : 
df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$')
# if you want the result as float : 
#df1['2016 Salary ($)']= df1['SalaryPaid'].str.strip('$').astype(float) 


df1

Result:

    Name    SalaryPaid  Year    2016 Salary ($)
0   John    $204546289.35   2008    204546289.35
1   Peter   $500231289.35   2009    500231289.35

1 Comment

pd.to_numeric(df.SalaryPaid.replace(['\$',','],'',regex=True),errors='coerce')
1

i have created a dummy dataframe as per your requirement and have performed the same operation as you had mentioned above and it worked fine for me.

import pandas as pd
df = pd.DataFrame(columns=['AA','BB'])
df['AA'] = ['$12,20','$13,30']
df['BB'] = ['X','Y']
print(df)

Output -----> AA BB 0 $12,20 X 1 $13,30 Y

df['AA'] = df['AA'].str.replace('$','').str.replace(',','').astype(float)
print(df)

Output -----> AA BB 0 1220.0 X 1 1330.0 Y

According to me the error is in second line of your code where you are trying to apply lambda, instead of "income['2016 Salary ($)'].apply(lambda X:X['Salary Paid'])" it should be "income['2016 Salary ($)'].apply(lambda X:X['SalaryPaid'])". I think there is a typo error with column named SalaryPaid.

Comments

0

can also do:

def convert(x):
    return float(x.replace('$','').replace(',',''))

income['2016 Salary ($)'] = income['Salary Paid'].apply(convert)

or

def convert(x):
    return float(''.join(re.findall('[\d+\.]',x)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.