Find number of occurrences of a list in a string using Python

Question

I have a list containing several thousand short strings and a .csv file containing several hundred thousand short strings. All list elements are unique. For each string in the .csv file, I need to check to see if it contains more than one list element.

For example. I have a string:

example_string = "mermaids have braids and tails"

And a list:

example_list = ["me", "ve", "az"]

Clearly the example string contains more than one list item; me and ve. My code needs to indicate this. However, if the list was

example_list = ["ai", "az", "nr"]

only one list element is contained.

I think that the following code will check to see if each line in my .csv file contains at least one list element. However, that doesn't tell me if it contains more than one different list element.

data = file("my_file_of_strings.csv", "r").readlines()
for line in data:       
    if any(item in my_list for i in line):
        #Do something#

Thanks for all of the helpful, insightful answers! ~♥

17th Lvl Botanist
– 17th Lvl Botanist

2012-11-28 00:24:13 +00:00
Commented Nov 28, 2012 at 0:24 — 17th Lvl Botanist
– 17th Lvl Botanist, Commented Nov 28, 2012 at 0:24

John La Rooy · Accepted Answer · 2012-11-27 23:34:19Z

2

with open("my_file_of_strings.csv", "r") as data:
    for line in data:       
        if any(item in i for i in line.split() for item in my_list):
            ...

If you need to count them use sum()

with open("my_file_of_strings.csv", "r") as data:
    for line in data:       
        result = sum(item in i for i in line.split() for item in my_list):

answered Nov 27, 2012 at 23:34

John La Rooy

306k54 gold badges378 silver badges514 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

John Kugelman · Accepted Answer · 2012-11-27 23:34:22Z

1

def contains_multiple(string, substrings):
    count = 0

    for substring in substrings:
        if substring in string:
            count += 1
            if count > 1:
                return True

    return False

for line in data:
    if contains_multiple(line, my_list):
        ...

Not short, but it will exit early as soon as it finds the 2nd match. That may or may not be an important optimization.

answered Nov 27, 2012 at 23:34

John Kugelman

365k70 gold badges555 silver badges600 bronze badges

1 Comment

17th Lvl Botanist Over a year ago

Works exactly as I had hoped especially with the breaking once a second match was found =). Thanks! ~♥

Jon Clements · Accepted Answer · 2012-11-27 23:35:23Z

0

Something like:

data = file("my_file_of_strings.csv", "r").readlines()
for line in data:       
    if len(set(item for item in my_list if item in line)) > 1:
        #Do something#

answered Nov 27, 2012 at 23:35

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Comments

RocketDonkey · Accepted Answer · 2012-11-28 16:06:48Z

0

I think the other solutions are better for your purpose, but in case you want to keep track of the number of hits and which ones they were, you could try this:

In [14]: from collections import defaultdict

In [15]: example_list = ["me", "ve", "az"]

In [16]: example_string = "mermaids have braids and tails"

In [17]: d = defaultdict(int)

In [18]: for i in example_list:
   ....:     d[i] += example_string.count(i)
   ....:

In [19]: d
Out[19]: defaultdict(<type 'int'>, {'me': 1, 'az': 0, 've': 1})

And then to get the total number of unique matches:

In [20]: matches = sum(1 for v in d.values() if v)

In [21]: matches
Out[21]: 2

edited Nov 28, 2012 at 16:06

answered Nov 27, 2012 at 23:57

RocketDonkey

37.4k8 gold badges83 silver badges84 bronze badges

Collectives™ on Stack Overflow

Find number of occurrences of a list in a string using Python

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related