2

I want to create an application that checks if word that the user typed in contains word/words from a separate text file (e.g. input = 'teeth', separate file contains the word 'eet') it should return True regardless of the sequence of the characters.

I looked at this thread matching all characters in any order in regex which is cool as it's working using set(). Problem is, set() doesn't allow you to use repeated characters(e.g. eeet, aaat).

I would like to know how should I approach this problem?

3
  • then count the leters occurrences with collections.Counter and compare dicts. Commented Feb 8, 2018 at 15:02
  • Does input 'teeth' contain the word 'eh'? Commented Feb 8, 2018 at 15:21
  • @StefanPochmann If the word 'eh' is listed in the separate file, it should return as True regardless of the sequence of the characters. That's how I like the program to work. Commented Feb 8, 2018 at 15:25

2 Answers 2

2

I would create a collections.Counter object from both strings, counting the characters, then substract the dicts, test if resulting dict is empty (which means string contains substring with cardinality respected)

import collections

def contains(substring, string):
    c1 = collections.Counter(string)
    c2 = collections.Counter(substring)
    return not(c2-c1)

print(contains("eeh","teeth"))
print(contains("eeh","teth"))

result:

True
False

Note that your example isn't representative as

>>> "eet" in "teeth"
True

that's why I changed it.

Sign up to request clarification or add additional context in comments.

3 Comments

I will try this
This is clever! Originally I thought it would have to reduce to something like the classic map-reduce interview question about counting anagrams (here it would be anagrams of every substring of the input string!) -- but since the only problem is set membership, your clever idea is much better.
thanks, and your answer is another improvement for performance.
2

I know it's unlikely, but if performance really mattered for very large inputs, you could avoid needing to create the second Counter and just iterate directly over the characters of the substring, allowing the possibility of terminating early if you run out of a given character.

In [26]: def contains2(string, substring):
    ...:     c = Counter(string)
    ...:     for char in substring:
    ...:         if c[char] > 0:
    ...:             c[char] -= 1
    ...:         else:
    ...:             return False
    ...:     return True
    ...: 

In [27]: contains2("teeth", "eeh")
Out[27]: True

In [28]: contains2("teeth", "ehe")
Out[28]: True

In [29]: contains2("teth", "ehe")
Out[29]: False

In [30]: contains2("teth", "eeh")
Out[30]: False

In [31]: def contains(string, substring):
    ...:     c1 = collections.Counter(string)
    ...:     c2 = collections.Counter(substring)
    ...:     return not(c2-c1)
    ...: 

In [32]: contains("teth", "ehe")
Out[32]: False

In [33]: contains("teeth", "ehe")
Out[33]: True

In [34]: contains("teeth", "eeh")
Out[34]: True

In [35]: %timeit contains("teeth", "eeh")
19.6 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [36]: %timeit contains2("teeth", "eeh")
9.59 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [37]: %timeit contains("friday is a good day", "ss a")
22.9 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [38]: %timeit contains2("friday is a good day", "ss a")
9.52 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

1 Comment

Apologies for the follow up question. I would like to know how's the approach for this if the subtring is a list? I tried looping on each word in the list first then loop on each character for each word. Should i put the function inside the for loop so each loop, the value for substring will change then I will proceed with the operation?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.