2

I'm wondering how to detect if two substrings match a main string in a specific order. For example if we're looking for "hours" and then "minutes" anywhere at all in a string, and the string is "what is 5 hours in minutes", it would return true. If the string was "what is 5 minutes in hours", it would return false.

2
  • 7
    regex? /hours.*minutes/? Commented Mar 7, 2016 at 19:07
  • And use '\b' if you need word boundaries '\bhours\b.*\bminutes\b' Commented Mar 10, 2016 at 18:45

5 Answers 5

2
s = "what is 5 hours in minutes"
a, b = s.find("hours"),s.find("minutes")
print(-1 < a < b)

You could also avoid checking for b if a does not exist in the string:

 def inds(s, s1, s2):
    a = s.find(s1)
    return -1 < a < s.find(s2)

If you want to start at a + 1 it is trivial to change:

def inds(s, s1, s2):
    a = s.find(s1)
    return -1 < a < s.find(s2, a+1)

But if you always want to make sure that a comes before b then stick to the first solutions. You also did not say if sub strings can be matched i.e:

a = "foo"
b = "bar"

Would match:

"foobar"

But they are not actual words in the string. If you want to match actual words then you will either need to split and clean the text or use word boundaries with a regex.

If you want to match exact words and not partial matches then use a regex using word boundaries:

import re


def consec(s, *args):
    if not args:
        raise ValueError("args cannot be empty")
    it = iter(args)
    prev = re.search(r"\b{}\b".format(next(it)), s)
    if not prev:
        return False
    prev = prev.end() 
    for w in args:
        ind = re.search(r"\b{}\b".format(w), s, prev + 1)
        if not ind:
            return False
        prev = ind.end() 
    return True

Which won't match "foo" and "bar" in foobar:

In [9]: consec("foobar","foo","bar")
Out[9]: False

In [10]: consec("foobar bar for bar","foo","bar")
Out[10]: False

In [11]: consec("foobar bar foo bar","foo","bar")
Out[11]: True

In [12]: consec("foobar","foo","bar")
Out[12]: False

In [13]: consec("foobar bar foo bar","foo","bar")
Out[13]: True

In [14]: consec("","foo","bar")
Out[14]: False

In [15]: consec("foobar bar foo bar","foobar","foo","bar")
Out[15]: True
Sign up to request clarification or add additional context in comments.

7 Comments

This will not work for the string "minutes hours minutes", in which "hours" indeed appears before "minutes". You need to search for "minutes" starting from position a+1.
@MathiasRav, I will leave it to the OP's to decide what but if it is required then it is a simple fix.
@ShaneSmiskol Keep in mind the comment by @MathiasRav. This will return false if your string is 'minutes hours minutes'
@Kupiakos How do I fix that then?
@ShaneSmiskol My answer given handles that. It also handles any number of words.
|
1

This will work with any set of words and any string:

def containsInOrder(s, *words):
    last = -1
    for word in words:
        last = s.find(word, last + 1)
        if last == -1:
            return False
    return True

Used like so:

>>> s = 'what is 5 hours in minutes'
>>> containsInOrder(s, 'hours', 'minutes')
True
>>> containsInOrder(s, 'minutes', 'hours')
False
>>> containsInOrder(s, '5', 'hours', 'minutes')
True
>>> containsInOrder('minutes hours minutes', 'hours', 'minutes')
True
>>> containsInOrder('minutes hours minutes', 'minutes', 'hours')
True

4 Comments

containsInOrder("foo")->True
@PadraicCunningham It contains the empty string.
What empty string? I passed nothing
@PadraicCunningham The function answers "does this string contain the words given, in order?". You gave it nothing. All strings contain nothing. Therefore, it is True.
0

You could use a regular expression such as "hours.*minutes", or you could use a simple string search that looks for "hours", notes the location where it is found, then does another search for "minutes" starting at that location.

Comments

0
 if index(a) < index(b):
    True
 else:
    This

Use the index method to determine which one comes first. The if statement gives a conditional as to what you do once you find out which comes first. Do you understand what I'm trying to say?

Comments

0

A regex will work well here. The regex r"hours.*minutes" says look for hours followed but 0 or more of any characters followed by minutes. Also, make sure to use the search function in the regex library rather than match, as match checks the from the beginning of the string.

import re
true_state ="what is 5 hours in minutes"
false_state = "what is 5 minutes in hours"
pat = re.compile(r"hours.*minutes")
statements = [true_state, false_state]
for state in statements:
    ans= re.search(pat, state)
    if ans:
        print state
        print ans.group()

Output

what is 5 hours in minutes
hours in minutes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.