3

I have a script that takes in an argument and tries to find a match using regex. On single values, I don't have any issues, but when I pass multiple words, the order matters. What can I do so that the regex returns no matter what the order of the supplied words are? Here is my example script:

import re
from sys import argv

data = 'some things other stuff extra words'

pattern = re.compile(argv[1])
search = re.search(pattern, data)

print search
if search:
    print search.group(0)
    print data

So based on my example, if I pass "some things" as an arg, then it matches, but if i pass "things some", it doesn't matches, and I would like it to. Optionally, I would like it to also return if either "some" or "things" match.

The argument passed could possibly be a regex

3
  • 1
    You could use itertools.permutations() to generate all possible orderings of the words in data, then call your regexp on each one. Commented Dec 28, 2017 at 4:14
  • @JohnGordon This was it! thanks! Commented Dec 28, 2017 at 4:25
  • why not re.compile("|".join(argv[1:]))? You can pass as many as you want. Commented Dec 28, 2017 at 4:27

3 Answers 3

2

I think you want something like this:

search = filter(None, (re.search(arg, data) for arg in argv[1].split()))

Or

search = re.search('|'.join(argv[1].split()), data)

You can then check the search results, if len(search) == len(argv[1].split()), then it means all patterns matched, and if search is truthy, then it means at least one of them matched.

Ok, I think I got it, you can use a lookahead assertion like this:

>>> re.search('(?=.*thing)(?=.*same)', data)

You can obviously programatically build such regex:

re.search(''.join('(?=.*{})'.format(arg) for arg in argv[1].split()), data)
Sign up to request clarification or add additional context in comments.

3 Comments

This would work if i was not using regex. But argv in this case could be a regex to allow for greater flexibility in the search.
I like the second option, but it still doesnt answer the question. It will match some or things, but not things some. It is only matching one word then, not both
perfect! This is exactly what i was looking for!
0

I think it would be better to just create several regexes and match each of them against the string. If any of them matches, you return True.

If you are just trying to match constant strings, the in operator is enough:

'some' in data or 'things' in data

2 Comments

I have considered that, but it quickly becomes overwhelming as the argument size grows. As in what is the arg passed is "some things other"? Then it gets too complex.
What becomes too complex? Which solution are you referring to? The in operator or separate regexes? You can use a loop. @securisec
0

You could also just split the data text into sublists, and check if the ordering/reverse ordering of search exists in it:

import re

data = 'some, things other stuff extra words blah.'

search = "things, some"

def search_text(data, search):
    data_words = re.compile('\w+').findall(data)
    # ['some', 'things', 'other', 'stuff', 'extra', 'words', 'blah']
    search_words = re.compile('\w+').findall(search)
    # ['things', 'some']

    len_search = len(search_words)

    candidates = [data_words[i:i+len_search] for i in range(0, len(data_words)-1, len_search-1)]
    # [['some', 'things'], ['things', 'other'], ['other', 'stuff'], ['stuff', 'extra'], ['extra', 'words'], ['words', 'blah']]

    return search_words in candidates or search_words[::-1] in candidates

print(search_text(data, search))

Which Outputs:

True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.