2

I have a pandas dataframe that looks like this:

Dataframe

I want to take the log of each value in the dataframe.

So that seemed like no problem at first, and then: data.apply(lambda x:math.log(x)) returned a type error (cannot convert series to class 'float').

Okay, fine--so, while type checking is often frowned upon, I gave it a shot (also tried casting x to a float, same problem):

isinstance((data['A1BG'][0]), np.float64) returns true, so I tried:

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x). That ran without any errors, but it didn't change any values in my dataframe.

What am I doing wrong?

Thanks!

3
  • Is the Hybridization column an index? If not, set it as the index first and then call applymap. Commented Aug 2, 2017 at 13:51
  • Yes--earlier in the notebook: data = data.set_index("Hybridization REF") Commented Aug 2, 2017 at 13:52
  • Yes. I'll write an answer explaining what's wrong. Commented Aug 2, 2017 at 13:53

3 Answers 3

2

What happens is that df.apply returns a pd.Series object for the lambda to operate over... It basically operates over a Series at a time, not one float at a time.

So, with

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x)

isinstance(x, np.float64) is never true (because x is a pd.Series type) and so the else is always executed.

To remedy this, you can operate a column at a time, using df.applymap:

data.applymap(math.log)

Using apply, the solution is similar, but you cannot escape the lambda:

data.apply(lambda x: np.log(x))

Or, alternatively (pd 0.20):

data.transform(lambda x: np.log(x))

Coincidentally, df.applymap is the fastest, followed by df.apply and df.transform.

Sign up to request clarification or add additional context in comments.

3 Comments

That's a great solution, thank you! Ysearka's solution also works, but this one gives me something that will be more applicable to a wider range of future uses.
@JulianStanley You can actually shorten your expression. Just pass the callback without a lambda - applymap(callback). Pandas will vectorise it automatically.
@JulianStanley Just tested on bigger data, and you should stick to apply or transform for large data. Cheers.
1

When you do apply on a dataframe, the apply function will be cast upon a Pandas.Series not a float (opposing to when you use apply on a Series). Then instead of math.log you should use np.log)

EDIT:

With examples it's always better:

test = pd.DataFrame(columns = ['a','b'])
test.a = np.random.random(5)
test.b = np.random.random(5)

    a           b
0   0.430111    0.420516
1   0.367704    0.785093
2   0.034130    0.839822
3   0.310254    0.755089
4   0.098302    0.136995

If you try the following, it won't work:

test.apply(lambda x: math.log(x))

TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index a')

But this will do the job:

test.apply(lambda x: np.log(x))

    a           b
0   -0.843711   -0.866273
1   -1.000476   -0.241953
2   -3.377588   -0.174565
3   -1.170364   -0.280919
4   -2.319708   -1.987811

1 Comment

Bam. Quick, simple, and worked like a charm. Thanks much!
0

Try this

 import math
 data.apply(lambda x:math.log(list(x)))

1 Comment

This also works, for a similar reason as ColdSpeed's solution--changing an array to a list. Thanks for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.