2

I have the dataframe below:


details = {
    'container_id' : [1, 2, 3, 4, 5, 6 ],
    'container' : ['black box', 'orange box', 'blue box', 'black box','blue box', 'white box'],
    'fruits' : ['apples, black currant', 'oranges','peaches, oranges', 'apples','apples, peaches, oranges', 'black berries, peaches, oranges, apples'],
}
  
# creating a Dataframe object 

df = pd.DataFrame(details)
  

I want to find the frequency of each fruit separately on a list.

I tried this code

df['fruits'].str.split(expand=True).stack().value_counts()

but I get the black count 2 times instead of 1 for black currant and 1 for black berries.

2 Answers 2

1

You can do it like you did, but with specifying the delimiter. Be aware that when splitting the data, you get some leading whitespace unless your delimiter is a comma with a space. To be sure just use another step with str.strip.

df['fruits'].str.split(',', expand=False).explode().str.strip().value_counts()

your way (you can also use str.strip after the stack command if you want to)

df['fruits'].str.split(', ', expand=True).stack().value_counts()

Output:

apples           4
oranges          4
peaches          3
black currant    1
black berries    1
Name: fruits, dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

0

Specify the comma separator followed by an optional space:

df['fruits'].str.split(',\s?', expand=True).stack().value_counts()

OUTPUT:

apples           4
oranges          4
peaches          3
black currant    1
black berries    1
dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.