4

I have some dataframe of three variables and I want to create a dictionary of the relative count of each label for each variable.

I easily created a forloop that outputs exactly what I want, however my lambda produces wierd results.

Here is the data:

In [3]:

import pandas as pd
raw_data = {
    'category1': ['Red', 'Red', 'Red', 'Green'],
    'category2': ['Plane', 'Plane', 'Plane', 'Car'],
    'category3': ['Orange', 'Orange', 'Orange', 'Banana'],
    }
df = pd.DataFrame(raw_data)
df
Out[3]:
category1   category2   category3
0   Red Plane   Orange
1   Red Plane   Orange
2   Red Plane   Orange
3   Green   Car Banana

This for loop produces the exact output I want:

In [4]:

forloop = {}
for column in df:
    forloop[column] = df[column].value_counts(normalize=True).to_dict()
forloop
Out[4]:
{'category1': {'Green': 0.25, 'Red': 0.75},
 'category2': {'Car': 0.25, 'Plane': 0.75},
 'category3': {'Banana': 0.25, 'Orange': 0.75}}

However, this lambda fails for some unknown reason:

In [6]:

ratio = lambda x: x.value_counts(normalize=True).to_dict()
output_lambda = df.apply(ratio)
output_lambda
Out[6]:
category1    <built-in method values of dict object at 0x10...
category2    <built-in method values of dict object at 0x10...
category3    <built-in method values of dict object at 0x10...
dtype: object

2 Answers 2

1

I actually can't understand what is going wrong here other than it's not unpacking the dict call, here is a round-about way to achieve what you want:

In [86]:
ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()]).apply(lambda x: x[0]).to_dict()
output_lambda

Out[86]:
{'category1': {'Green': 1, 'Red': 3},
 'category2': {'Car': 1, 'Plane': 3},
 'category3': {'Banana': 1, 'Orange': 3}}

It looks like it's binding the function object as the column value rather than unpacking it to a dict, what I'm doing above is to return the value_counts as a list and then call apply again to unpack the single element list. This forces the dict to be unpacked into a single element list in the initial apply call:

In [87]:
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()])
output_lambda

Out[87]:
category1        [{'Green': 1, 'Red': 3}]
category2        [{'Plane': 3, 'Car': 1}]
category3    [{'Banana': 1, 'Orange': 3}]
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

I guess the problem is that the lambda function is returning an object that can not be tranformed to a Series or DataFrame by pandas (but should be confirmed by pandas experts).

You can achieve almost the same thing with slight modifications of your code:

ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(ratio).to_dict()

If you do not want to have the nan in output_lambda, you can use a solution like the one proposed in this answer: https://stackoverflow.com/a/26033302/4709400

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.