4

I have to search the following patterns in a file, (any match qualifies)

pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
pattern = [re.compile(x) for x in pattern_strings]

and function using this

def find_pattern(path):
    with open(path, 'r') as f:
        for line in f:
            found = pattern.search(line)
            if found:
                logging.info('found - ' + found)

When I try using it

find_pattern('myfile')

I see AttributeError: "'list' object has no attribute 'search'"

because patterns is

[<_sre.SRE_Pattern object at 0x107948378>, <_sre.SRE_Pattern object at 0x107b31c70>, <_sre.SRE_Pattern object at 0x107b31ce0>, <_sre.SRE_Pattern object at 0x107ac3cb0>, <_sre.SRE_Pattern object at 0x107b747b0>, <_sre.SRE_Pattern object at 0x107b74828>, <_sre.SRE_Pattern object at 0x107b748a0>, <_sre.SRE_Pattern object at 0x107b31d50>, <_sre.SRE_Pattern object at 0x107b31dc0>]

How can I have one pattern which looks for all strings in pattern_strings?

1 Answer 1

4

You could simply concatenate all the expressions together with a |:

pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
pattern_string = '|'.join(pattern_strings)
pattern = re.compile(pattern_string)

This does, however, assume that none of your patterns are complicated enough that a simple concatenation like this might break. For the ones in your example, it should work. For more complex patterns, it may not.

Sign up to request clarification or add additional context in comments.

1 Comment

you should also sort the list longest to shortest... otherwise you will get not the results you expect..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.