1

Basically, I have a list of substrings I am searching a string for. I'm using any() currently and doing some work if one of the words are found in the string. I want to start logging the matches to keep some stats on the matches. I am using any() right now.

Is there a way to do the same thing as any, but store the match in a variable? I'm fetching and searching up to 100 strings every 10 seconds for a list of 25-30 substrings. THe only thing I can think of is iterating through each substring in the list for each string but im not sure of the performance implications with that approach.

3 Answers 3

2

Let's look at this example:

s = "Thisisarandomstringthatiwanttotype"
subst = ["This", "random", "hullo", "type"]

To return all the substrings that match:

filter(lambda x: x in s, subs)
>> ['This', 'random', 'type']

To return the starting index of the substrings that match, you can pass the list of strings returned from the code segment above to a map function to find their index:

map(lambda x: s.index(x), filter(lambda x: x in s, subs))
>> [0, 7, 30]

Similarly, you could check the length of the returned strings using map over the filter:

map(lambda x: len(x), filter(lambda x: x in s, subs))
>> [4, 6, 4]

Or find the length of the longest substring returned:

max(filter(lambda x: x in s, subst), key=len)
>> 'random'
Sign up to request clarification or add additional context in comments.

Comments

1

There are multiple ways to do this. Regular expressions (As FreddieV4 suggested) are very powerful.

However another simple approach is using a list comprehension like:

matches = [x for x in string.split() if x in substrings]

Which will loop over the the words in the string and check if the word fits one of the substrings, if so it will be returned and hence can be used for logging purposes.

You can even further extend this to handle a list of strings as input instead of a single string - all in a single list comprehension.

An extensive example is shown below:

substrings = ["cool","test","notpresent"]

#get matches for a single string
string = "This is a basic test"
matches = [x for x in string.split() if x in substrings]
print(matches)
# >> ['test']


#get matches for multiple strings
strings = ["I am so awesome", "you are cool", "I think so", "Yep this is a test"]
matches = [x for string in strings for x in string.split() if x in substrings]
print(matches)
# >> ['cool', 'test']

10 Comments

You need to split the string.
Why is that @JaredGoguen?
x for x in string will iterate over the individual characters in string.
That's really interesting.x for string in strings for x in string.split() if x in substrings looks crazy! I'm iterating over the strings for other purposes so the first example with a single string will probably suit my needs. Thanks!
That's true, but that behavior is similar to the any() function currently used by OP. If he wants more flexibility, regexes are a better way to go.
|
0

For this sort of thing, you could use the re module.

>>> import re
>>> m = re.search(r"substring1, substring2, substring3", string)

string would be the string you're searching through, and m would be the variable that contains the group of strings matching whatever substrings you're looking for i.e. substring1, substring2, substring3; you could use RegEx patterns rather than substrings as well.

1 Comment

Would this store all matched substrings in m or just the first one?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.