1

OK So I have list of strings that I would to use as a regex search. e.g.

import re
regex_strings = ['test1','test2','test3']

#Obviously this won't work here as is!  
regex = re.compile(regex_strings)

I also have another list of strings. e.g.

strgs = ['This is a test1','This is a test2','This is a test1','This is a test1','This is a test3']

I want to iterate over the 'strgs' list and regex check each string against the 'regex_strings' list. Then, if there's a match, return the entire string.

I've been scratching my head here for a bit and I'm not quite sure the best way to approach this. Any suggestions would be really appreciated!

Regards.

3 Answers 3

1

You can use | operator in regular expression like this

re.compile("(" + "|".join(regex_strings) + ")")

So, the regular expression becomes like this (test1|test2|test3). You can check the meaning of this regular expression here http://regex101.com/r/pR5pU1

Sample run:

import re
regex_strings = ['test1','test2','test3']
regex = re.compile("(" + "|".join(regex_strings) + ")")
strgs = ['This is a test1','This is a test2','This is a test1','This is a test1','This is a test3']
print [strg for strg in strgs if regex.search(strg)]

Output

['This is a test1', 'This is a test2', 'This is a test1', 'This is a test1', 'This is a test3']

Edit: If you want to return only the matched part,

import re
regex_strings = ['test1','test2','test3']
regex = re.compile("(" + "|".join(regex_strings) + ")")
strgs = ['This is a test1','This is a test2','This is a test1','This is a test1','This is a test3']
result = []
for strg in strgs:
    temp = regex.search(strg)
    if temp:
        result.append(temp.group())
print result

Output

['test1', 'test2', 'test1', 'test1', 'test3']
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks very much for this. It worked a treat. Although as I mentioned below I need to spend some time to understand whats going on.
@user1513388 You are welcome. Please go through the documentation for the functions which you dont understand, play with them. If you still have doubts, comment here. I ll try and help you :)
Just one quick question - If I wanted to return the actual matches instead of the entire line e.g. test1,test2 or test3. How could I do that?
0

If it is not too much data and your regular expressions don't have to be compiled, this line would do it.

print [ s for s in strgs for reg in regex_strings if re.search(reg, s) ]

otherwise, maybe this helps:

import re
compiled_regs = map(re.compile, regex_strings)
print [ s for s in strgs for reg in compiled_regs if reg.search(s) ]

Output in both cases:

['This is a test1', 'This is a test2', 'This is a test1', 'This is a test1', 'This is a test3']

Comments

0

There are nicer ways of doing this, the other answers are good examples of such ways, but I thought I'd go from the start

Let's think about this in steps. Compilation isn't needed for now so let's skip that.

You want to iterate over strgs and check each string. This leaves us with.

for string in strgs:
    check it against each string in regex_string

Which obviously expands to

for string in strgs:
    for regex_string in regex_strings:
       check string against regex_string and print if matching

Now the only question is, how do you check a string against a regex. A quick look through google gives this page http://docs.python.org/2/howto/regex.html, or

re.match(regex_string, string)

Including this gives

for strg in strgs:
    for regex_string in regex_strings:
       m = re.match(regex_string, strg)
       if m: #short for if m != None
           print value of m

Back to the regex howto gives us m.string leaving the resulting complete code of

for strg in strgs:
    for regex_string in regex_strings:
       m = re.match(regex_string, strg)
       if m: #short for if m != None
           print m.string

Adding compilation of the regex isn't that hard once you've done these steps so I leave that to you.

1 Comment

Wow! - Thanks for the detailed overview on how this works. I'm going to refer back to this when I get some more time. In the meantime @thefourtheye's answer did work for, but don't fully understand it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.