Pandas DataFrame: Cannot convert string into a float

Question

I have a column Column1 in a pandas dataframe which is of type str, values which are in the following form:

import pandas as pd
df = pd.read_table("filename.dat")
type(df["Column1"].ix[0])   #outputs 'str'
print(df["Column1"].ix[0])

which outputs '1/350'. So, this is currently a string. I would like to convert it into a float.

I tried this:

df["Column1"] = df["Column1"].astype('float64', raise_on_error = False)

But this didn't change the values into floats.

This also failed:

df["Column1"] = df["Column1"].convert_objects(convert_numeric=True)

And this failed:

df["Column1"] = df["Column1"].apply(pd.to_numeric, args=('coerce',))

How do I convert all the values of column "Column1" into floats? Could I somehow use regex to remove the parentheses?

EDIT:

The line

df["Meth"] = df["Meth"].apply(eval)

works, but only if I use it twice, i.e.

df["Meth"] = df["Meth"].apply(eval)
df["Meth"] = df["Meth"].apply(eval)

Why would this be?

Community · Accepted Answer · 2017-05-23 12:26:21Z

4

You need to evaluate the expression (e.g. '1/350') in order to get the result, for which you can use Python's eval() function.

By wrapping Panda's apply() function around it, you can then execute the eval() function on every value in your column. Example:

df["Column1"].apply(eval)

As you're interpreting literals, you can also use the ast.literal_eval function as noted in the docs. Update: This won't work, as the use of literal_eval() is still restricted to additions and subtractions (source).

Remark: as mentioned in other answers and comments on this question, the use of eval() is not without risks, as you're basically executing whatever input is passed in. In other words, if your input contains malicious code, you're giving it a free pass.

Alternative option:

# Define a custom div function
def div(a,b):
    return int(a)/int(b)

# Split each string and pass the values to div
df_floats = df['col1'].apply(lambda x: div(*x.split('/')))

Second alternative in case of unclean data:

By using regular expressions, we can remove any non-digits appearing resp. before the numerator and after the denominator.

# Define a custom div function (unchanged)
def div(a,b):
    return int(a)/int(b)

# We'll import the re module and define a precompiled pattern
import re
regex = re.compile('\D*(\d+)/(\d+)\D*')

df_floats = df['col1'].apply(lambda x: div(*regex.findall(x)[0]))

We'll lose a bit of performance, but the upside is that even with input like '!erefdfs?^dfsdf1/350dqsd qsd qs d', we still end up with the value of 1/350.

Performance:

When timing both options on a dataframe with 100.000 rows, the second option (using the user defined div function) clearly wins:

using eval: 1 loop, best of 3: 1.41 s per loop
using div: 10 loops, best of 3: 159 ms per loop
using re: 1 loop, best of 3: 275 ms per loop

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Aug 1, 2016 at 20:19

DocZerø

8,61511 gold badges44 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

ShanZhengYang Over a year ago

"this is currently a string. I would like to convert it into a float"

ShanZhengYang Over a year ago

No problem :) Thank you for the help!

Jeff Over a year ago

+1 for the benchmark there, I was surprised at first that eval was slower, but maybe it makes sense. eval needs to account for whatever code might have been passed in, while your alternative explicitly works on the /

ShanZhengYang Over a year ago

The error I get for the above div function (which I like) is alueError: invalid literal for int() with base 10: "'9" I'm not sure how to fix this

Jeff Over a year ago

Looks like there's some errors in the formatting of the data then, with an extra single quote. That may mean there are other issues, but in this case, in this answer, change it to return int(a.strip("'")/int(b.strip("'")) - that is a double-quote, single-quote, double-quote. Should fix it.

|

piRSquared · Accepted Answer · 2016-08-01 20:50:29Z

3

I hate advocating for the use of eval. I didn't want to spend time on this answer but I was compelled because I don't want you to use eval.

So I wrote this function that works on a pd.Series

def do_math_in_string(s):
    op_map = {'/': '__div__', '*': '__mul__', '+': '__add__', '-': '__sub__'}
    df = s.str.extract(r'(\d+)(\D+)(\d+)', expand=True)
    df = df.stack().str.strip().unstack()
    df.iloc[:, 0] = pd.to_numeric(df.iloc[:, 0]).astype(float)
    df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2]).astype(float)
    def do_op(x):
        return getattr(x[0], op_map[x[1]])(x[2])
    return df.T.apply(do_op)

Demonstration

s = pd.Series(['1/2', '3/4', '4/5'])

do_math_in_string(s)

0    0.50
1    0.75
2    0.80
dtype: float64

do_math_in_string(pd.Series(['1/2', '3/4', '4/5', '6+5', '11-7', '9*10']))

0     0.50
1     0.75
2     0.80
3    11.00
4     4.00
5    90.00
dtype: float64

Please don't use eval.

edited Aug 1, 2016 at 20:50

answered Aug 1, 2016 at 20:39

piRSquared

296k68 gold badges509 silver badges654 bronze badges

6 Comments

ShanZhengYang Over a year ago

"You'll have better self esteem later in life if you avoid it now." Why?

piRSquared Over a year ago

That comment is me being comedically dramatic. But in practice, using eval is a recipe for code that is difficult to manage, maintain, debug, and keep secure. It's very generally regarded as bad practice.

Jeff Over a year ago

If he's not dealing with user input, how would using eval here be a problem?

piRSquared Over a year ago

I'll say a couple more things: One, an input file can very well be user input. Two, it still makes it more difficult to track bugs and maintain (my belief). Three, all this is coding philosophy and doesn't really matter as the OP can do what he wants. My answer still does what it does, regardless of opinions on eval.

Jeff Over a year ago

I was actually just curious if there was anything more to it than the problem that it evaluates whatever a user types in. It's a good thing to note for the OP though, that eval can be dangerous if the assumption that this isn't somehow involving user input is wrong.

|

Jeff · Accepted Answer · 2016-08-01 20:13:49Z

2

You can do it by applying eval to the column:

data = {'one':['1/20', '2/30']}
df = pd.DataFrame(data)

In [8]: df['one'].apply(eval)
Out[8]:
0    0.050000
1    0.066667
Name: one, dtype: float64

answered Aug 1, 2016 at 20:13

Jeff

2,2381 gold badge19 silver badges30 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame: Cannot convert string into a float

3 Answers 3

6 Comments

Demonstration

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Demonstration

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related