matching string in python

Question

The question I have is regarding the identification of a series of string in python. Let me explain what I am trying to do:

A string such as tom and jerry could also be written as in lowercase

tom n jerry
tom_jerry
tom & jerry
tom and jerry

and so on and so forth. As you can see there in the minimal example, there were 4 possible ways where even if I created a dictionary with these 3 ways, i will miss out on a string containing tom _ jerry. What can I do to recognize tom and jerry, creating many rules seems very inefficient. Is there a more efficient way to do this ?

Recognizing all possibilities will require artificial intelligence. Maybe a NLP library can do this. — Barmar
– Barmar, Commented Aug 26, 2022 at 20:07
It really depends on your acceptance criteria. For this specific example you could do s.startswith("tom") and s.endswith("jerry") to test a given string s and it'd return true for all of the examples. But it would also return true for really huge strings that you might not want to accept, and it would return false on minor mispellings of either tom or jerry, which you also might not want. — Samwise
– Samwise, Commented Aug 26, 2022 at 20:08
A better approach might be to compute the Levenshtein distance (which is relatively straightforward) and decide on a particular threshold that is "close enough" for your purposes. — Samwise
– Samwise, Commented Aug 26, 2022 at 20:09
There are fundamentally two ways to do this: (1) Use NLP. This might be a bit over-the-top depending on your use case. (2) Create a set of rules, such as s.startswith("tom") and s.endswith("jerry") and len(s) < 15 — Lecdi
– Lecdi, Commented Aug 26, 2022 at 20:13
@Lecdi I dont think I can do that because the string is part of sentence, — Slartibartfast
– Slartibartfast, Commented Aug 26, 2022 at 20:15

OldManSeph · Accepted Answer · 2022-08-26 20:32:37Z

2

This will find any of those combinations in a sentence:

combo = "tom n jerry"
string = "This is an episode of" + combo + "that deals with something."
substring = string[string.find("tom"):string.find("jerry")+5]
print(substring)

answered Aug 26, 2022 at 20:32

OldManSeph

2,79020 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Michael Gathara · Accepted Answer · 2022-08-26 23:30:34Z

You could attempt this using a sequence matcher.

from difflib import SequenceMatcher

def checkMatch(firstWord: str, secondWord: str, strictness: float):
    ratio = SequenceMatcher(None, firstWord.strip(), secondWord.strip()).ratio()
    if ratio > strictness:
        return 1
    return 2

if __name__ == "__main__":
    originalWord = "tom and jerry"
    toMatch = "tom_jerry" # chose this one as it is the least likely in your example
    toMatch.lower() # easier to match if you lower or upper both the original and the match
    strictness = 0.6 # a strictness of 0.6 would mean the words are generally pretty similiar
    print(checkMatch(originalWord, toMatch, strictness))

You can learn more about how sequence matcher works here: https://towardsdatascience.com/sequencematcher-in-python-6b1e6f3915fc

Collectives™ on Stack Overflow

matching string in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related