1

Say I have the following list.

food_list = ['ice cream', 'apple', 'pancake', 'sushi']

And I want to find each item on that list on the following string.

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'

my_str = my_str.lower()

I want to count the number of items in the string.

ice cream : 2, apple: 1, pancake: 1, sushi:0

Notice that apple is only counted once, because apples should not be counted. I cannot possibly split it by space, because of items like ice cream.

I was thinking of replacing the word in the list by something and count that later, but it's very slow (when applied to bigger data). And I wonder if there is better solution.

for word in food_list:
    find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
    count_word = find_word.count("***")
    print(word+": "+str(count_word))

I hope it's clear enough. Thanks

5 Answers 5

1

Use re.findall with dict comprehension:

import re

cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}

Output:

{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}
Sign up to request clarification or add additional context in comments.

1 Comment

I appreciated all the other responses. But I like this one the most, because I can instantly understand it, and it's the one I ended up using. Thank you all.
1

You can match exact word in string using re.finditer

import re


food_list = ['ice cream', 'apple', 'pancake', 'sushi']

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()


output = {}
for word in food_list:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
   output[word] = count

Output:

for word, count in output.items():
    print(word, count)

>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0

2 Comments

Interesting. Never heard of re.finditer. But, even when you already use that, you still have to use the \b thing?
@AnnaRG acctaully re.finditer just return an iterator yielding MatchObject instances, but we have to use \b to match exact pattern or word in string.
0

You can simply use a regex that takes word boundaries into account in a dictionary comprehension:

>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}

Comments

0

In a single scan regex will try to find all the matches and then count of each can be computed from all the matches found in the string.

food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}

Comments

0

you can run over string finding match by adjusting start position:

def find_all(a_str, sub):
start = 0
counter = 0
while True:
    start = a_str.find(sub, start)
    if start == -1: return
    counter += 1
    yield start
    start += len(sub) # use start += 1 to find overlapping matches

if __name__ == "__main__":
    food_list = ['ice cream', 'apple', 'pancake', 'sushi']
    my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
    my_str = my_str.lower()
    counts = {}
    for item in food_list:
        counts.update({item: len(list(find_all(my_str, item)))})
    print(counts)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.