Pandas value_counts() for loop fails as lambda

Question

I have some dataframe of three variables and I want to create a dictionary of the relative count of each label for each variable.

I easily created a forloop that outputs exactly what I want, however my lambda produces wierd results.

Here is the data:

In [3]:

import pandas as pd
raw_data = {
    'category1': ['Red', 'Red', 'Red', 'Green'],
    'category2': ['Plane', 'Plane', 'Plane', 'Car'],
    'category3': ['Orange', 'Orange', 'Orange', 'Banana'],
    }
df = pd.DataFrame(raw_data)
df
Out[3]:
category1   category2   category3
0   Red Plane   Orange
1   Red Plane   Orange
2   Red Plane   Orange
3   Green   Car Banana

This for loop produces the exact output I want:

In [4]:

forloop = {}
for column in df:
    forloop[column] = df[column].value_counts(normalize=True).to_dict()
forloop
Out[4]:
{'category1': {'Green': 0.25, 'Red': 0.75},
 'category2': {'Car': 0.25, 'Plane': 0.75},
 'category3': {'Banana': 0.25, 'Orange': 0.75}}

However, this lambda fails for some unknown reason:

In [6]:

ratio = lambda x: x.value_counts(normalize=True).to_dict()
output_lambda = df.apply(ratio)
output_lambda
Out[6]:
category1    <built-in method values of dict object at 0x10...
category2    <built-in method values of dict object at 0x10...
category3    <built-in method values of dict object at 0x10...
dtype: object

EdChum · Accepted Answer · 2015-06-02 22:12:26Z

I actually can't understand what is going wrong here other than it's not unpacking the dict call, here is a round-about way to achieve what you want:

In [86]:
ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()]).apply(lambda x: x[0]).to_dict()
output_lambda

Out[86]:
{'category1': {'Green': 1, 'Red': 3},
 'category2': {'Car': 1, 'Plane': 3},
 'category3': {'Banana': 1, 'Orange': 3}}

It looks like it's binding the function object as the column value rather than unpacking it to a dict, what I'm doing above is to return the value_counts as a list and then call apply again to unpack the single element list. This forces the dict to be unpacked into a single element list in the initial apply call:

In [87]:
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()])
output_lambda

Out[87]:
category1        [{'Green': 1, 'Red': 3}]
category2        [{'Plane': 3, 'Car': 1}]
category3    [{'Banana': 1, 'Orange': 3}]
dtype: object

Community · Accepted Answer · 2017-05-23 12:05:58Z

1

I guess the problem is that the lambda function is returning an object that can not be tranformed to a Series or DataFrame by pandas (but should be confirmed by pandas experts).

You can achieve almost the same thing with slight modifications of your code:

ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(ratio).to_dict()

If you do not want to have the nan in output_lambda, you can use a solution like the one proposed in this answer: https://stackoverflow.com/a/26033302/4709400

edited May 23, 2017 at 12:05

CommunityBot

11 silver badge

answered Jun 2, 2015 at 22:12

stellasia

5,6624 gold badges27 silver badges45 bronze badges

Collectives™ on Stack Overflow

Pandas value_counts() for loop fails as lambda

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related