I have a pattern compiled as
pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
join_pattern = '|'.join(pattern_strings)
pattern = re.compile(join_pattern)
and then I find pattern in file as
def find_pattern(path):
with open(path, 'r') as f:
for line in f:
print line
found = pattern.search(line)
if found:
print dir(found)
logging.info('found - ' + found)
and my input as path file is
\xc2d
d\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
'619d813\xa03697'
When I run this program, nothing happens.
I it not able to catch these patterns, what is am I doing wrong here?
Desired output - each line because each line has one or the other matching pattern
Update
After changing the regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
It is still the same, no output
UPDATE
after making regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
join_pattern = '[' + '|'.join(pattern_strings) + ']'
pattern = re.compile(join_pattern)
Things started to work, but partially, the patterns still not caught are for line
\xc2\xa0
\xc3\xa7
\xa0\xa0
for which my pattern string is ['\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0']
join_pattern = "("+"|".join(pattern_strings)+")"instead [ ]. since [] only matches single chars ... also you should order your list from largest to smallest