How do I count the occurrence of each item from a list in a string in Python?

Question

Say I have the following list.

food_list = ['ice cream', 'apple', 'pancake', 'sushi']

And I want to find each item on that list on the following string.

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'

my_str = my_str.lower()

I want to count the number of items in the string.

ice cream : 2, apple: 1, pancake: 1, sushi:0

Notice that apple is only counted once, because apples should not be counted. I cannot possibly split it by space, because of items like ice cream.

I was thinking of replacing the word in the list by something and count that later, but it's very slow (when applied to bigger data). And I wonder if there is better solution.

for word in food_list:
    find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
    count_word = find_word.count("***")
    print(word+": "+str(count_word))

I hope it's clear enough. Thanks

Chris · Accepted Answer · 2019-09-26 04:29:14Z

1

Use re.findall with dict comprehension:

import re

cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}

Output:

{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}

edited Sep 26, 2019 at 4:29

answered Sep 26, 2019 at 4:24

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

catris25 Over a year ago

I appreciated all the other responses. But I like this one the most, because I can instantly understand it, and it's the one I ended up using. Thank you all.

Chris · Accepted Answer · 2019-09-26 04:39:04Z

1

You can match exact word in string using re.finditer

import re


food_list = ['ice cream', 'apple', 'pancake', 'sushi']

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()


output = {}
for word in food_list:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
   output[word] = count

Output:

for word, count in output.items():
    print(word, count)

>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0

edited Sep 26, 2019 at 4:39

Chris

29.8k3 gold badges34 silver badges56 bronze badges

answered Sep 26, 2019 at 4:32

Saleem Ali

1,38311 silver badges21 bronze badges

2 Comments

catris25 Over a year ago

Interesting. Never heard of re.finditer. But, even when you already use that, you still have to use the \b thing?

Saleem Ali Over a year ago

@AnnaRG acctaully re.finditer just return an iterator yielding MatchObject instances, but we have to use \b to match exact pattern or word in string.

Selcuk · Accepted Answer · 2019-09-26 04:26:41Z

0

You can simply use a regex that takes word boundaries into account in a dictionary comprehension:

>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}

answered Sep 26, 2019 at 4:26

Selcuk

60.1k12 gold badges114 silver badges119 bronze badges

Comments

yabhishek · Accepted Answer · 2019-09-26 04:37:42Z

0

In a single scan regex will try to find all the matches and then count of each can be computed from all the matches found in the string.

food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}

answered Sep 26, 2019 at 4:37

yabhishek

4194 silver badges15 bronze badges

Comments

Pratibha Gupta · Accepted Answer · 2019-09-26 04:59:47Z

you can run over string finding match by adjusting start position:

def find_all(a_str, sub):
start = 0
counter = 0
while True:
    start = a_str.find(sub, start)
    if start == -1: return
    counter += 1
    yield start
    start += len(sub) # use start += 1 to find overlapping matches

if __name__ == "__main__":
    food_list = ['ice cream', 'apple', 'pancake', 'sushi']
    my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
    my_str = my_str.lower()
    counts = {}
    for item in food_list:
        counts.update({item: len(list(find_all(my_str, item)))})
    print(counts)

Collectives™ on Stack Overflow

How do I count the occurrence of each item from a list in a string in Python?

5 Answers 5

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related