8

I am fairly new to Python so forgive me this simple question. I'm trying to convert string to float. Here is a sample of the data:

0     10.65%
1      7.90%

When I try:

 df['int_rate'] = df['int_rate'].astype('float')

I get:

ValueError: could not convert string to float: '13.75%'

When I try:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

And check my data, I get:

0     None
1     None

Any ideas what I'm doing wrong? Many thanks!

4
  • df['int_rate'] = df['int_rate'][:-1].astype('float') cause: % can't convert to float Commented Sep 20, 2017 at 12:36
  • 2
    df['int_rate'].apply(lambda x: x.rstrip("%")).astype(float) Commented Sep 20, 2017 at 12:36
  • have you looked at this? stackoverflow.com/questions/12432663/… Commented Sep 20, 2017 at 12:37
  • 1
    df['int_rate'] = float( df['int_rate'].replace("%","")) Commented Sep 20, 2017 at 12:37

3 Answers 3

6

You can use Series.replace with parameter regex=True for replace substrings:

df = pd.DataFrame({'int_rate':['10.65%','7.90%']})
df['int_rate'] = df['int_rate'].replace("%","", regex=True).astype(float)
print (df)
   int_rate
0     10.65
1      7.90

Or Series.str.replace:

df['int_rate'] = df['int_rate'].str.replace("%","")
print (df)
  int_rate
0    10.65
1     7.90
2         

Or Series.str.rstrip:

df['int_rate'] = df['int_rate'].str.rstrip("%").astype(float)
print (df)
   int_rate
0     10.65
1      7.90

See difference without it:

df = pd.DataFrame({'int_rate':['10.65%','7.90%', '%']})

df['int_rate_subs'] = df['int_rate'].replace("%","", regex=True)
df['int_rate_val'] = df['int_rate'].replace("%","")
print (df)
  int_rate int_rate_subs int_rate_val
0   10.65%         10.65       10.65%
1    7.90%          7.90        7.90%
2        %                           
Sign up to request clarification or add additional context in comments.

4 Comments

Yes, I also found that setting regex=True makes it work, however, can you explain why? At first I thought the problem to be that the % needs to be escaped with %% (as requred by old style string interpolation), but that makes no difference. I don't get why treating % as a regex fixes the problem.
Thanks so much, Jezrael and others for chipping in. Regex=True did the trick. As mhawke, I would also be curious as to why? Many thanks!
Really hard get nice answer, but replace looks for exact matches unless you pass a regex pattern and param regex=True
Thanks Jezrael, that makes sense.
5

As you guessed, ValueError: could not convert string to float: '13.75%' indicates that the % character blocks the convertion.

Now when you try to remove it:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

You set inplace=True in your replacement, which as the name suggests changes the dataframe in-place, so replace() method call returns None. Thus you store None in df['int_rate'] and end up with a column containing only None values. You should either do:

df['int_rate'] = df['int_rate'].replace("%","") 

or

df['int_rate'].replace("%","", inplace=True)

1 Comment

Thank you, Guillaume. That makes sense. However, for some bizarre reason, the % sign is not coming off with either method. The rows remain unchanged. Any thoughts?
2

Since you're using a string, you could convert the value to a float using

float(df['int_rate'][:-1])

This reads the string from the first position to the second to last position, 10.65 instead of 10.65%.

4 Comments

Thanks Tyler. I get TypeError: cannot convert the series to <class 'float'>
No problem, Minsky, what version of Python are you using? I'll take a closer look at the issue to see if I can replicate the error you're receiving. I should add that I'm using 3.6.1 above.
Appreciate it. I'm on Python 3.6. I'm practicing on the Lending Club data-set (directly from their website).
Just to clarify, I'm using the Anaconda version of 3.6. The complete error message is Traceback (most recent call last): File "C:\Users\avesu\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-56-340f2d2704e2>", line 1, in <module> float(df['int_rate'][:-1]) File "C:\Users\avesu\Anaconda3\lib\site-packages\pandas\core\series.py", line 97, in wrapper "{0}".format(str(converter))) TypeError: cannot convert the series to <class 'float'>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.