0

How do i use list variable in regexp? The problem is here:

re.search(re.compile(''.format('|'.join(map(re.escape, kand))), corpus.raw(fileid)))

error is

TypeError: unsupported operand type(s) for &: 'str' and 'int'

simple re.search works well, but i need list as first attribute in re.search:

for fileid in corpus.fileids():
    if re.search(r'[Чч]естны[й|м|ого].труд(а|ом)', corpus.raw(fileid)):
        dict_features[fileid]['samoprezentacia'] = 1
    else:
        dict_features[fileid]['samoprezentacia'] = 0

if re.search(re.compile('\b(?:%s)\b'.format('|'.join(map(re.escape, kand))), corpus.raw(fileid))):
    dict_features[fileid]['up'] = 1
else:
    dict_features[fileid]['up'] = 0

return dict_features

by the way kand is list:

kand = [line.strip() for line in open('kand.txt', encoding="utf8")]

in output kand is ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']

Edit: i am using Python 3.3.2 with WinPython on Windows 7 full errors stack:

Traceback (most recent call last):
  File "F:/Python/NLTK packages/agit_classify.py", line 59, in <module>
    print (regexp_features(agit_corpus))
  File "F:/Python/NLTK packages/agit_classify.py", line 53, in regexp_features
    if re.search(re.compile(r'\b(?:{0})\b'.format('|'.join(map(re.escape, kandidats_all))), corpus.raw(fileid))):
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\re.py", line 214, in compile
    return _compile(pattern, flags)
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\re.py", line 281, in _compile
    p = sre_compile.compile(pattern, flags)
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\sre_compile.py", line 494, in compile
    p = sre_parse.parse(p, flags)
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\sre_parse.py", line 748, in parse
    p = _parse_sub(source, pattern, 0)
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\sre_parse.py", line 360, in _parse_sub
    itemsappend(_parse(source, state))
  File "F:\WinPython-32bit-3.3.2.0\python-3.3.2\lib\sre_parse.py", line 453, in _parse
    if state.flags & SRE_FLAG_VERBOSE:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
6
  • 3
    What are you expecting ''.format to do? Commented Aug 8, 2013 at 21:58
  • Why would you ever do re.search(re.compile(…))? Just passing the regexp pattern string to re.search does the exact same thing. Or, if you need to compile the regexps explicitly (e.g., performance, because you're going to use them repeatedly, or just to refactor your code), just use the search method on the compiled regexp. Commented Aug 8, 2013 at 22:01
  • i assume format list as apple|banana|peach - i copied this from another answer on stackoverflow Commented Aug 8, 2013 at 22:01
  • Also, I can't see how you could ever get that error from that line of code. Can you give us the complete traceback, and the actual line of code it refers to? Commented Aug 8, 2013 at 22:03
  • experimented now and realized that this works, but not sure it right or pythonic: re.search(re.compile('\b(?:%s)\b' + '|'.join(map(re.escape, kandidats_all))), corpus.raw(fileid)) Commented Aug 8, 2013 at 22:12

2 Answers 2

2

The reason you're getting the actual exception is mismatched parentheses. Let's break it up to make it clearer:

re.search(
    re.compile(
        ''.format('|'.join(map(re.escape, kand))), 
        corpus.raw(fileid)))

In other words, you're passing a string, corpus.raw(fileid), as the second argument to re.compile, not as the second argument to re.search.

In other words, you're trying to use it as the flags value, which is supposed to be an integer. When re.compile tries to use the & operator on your string to test each flag bit, it raises a TypeError.

And if you got past this error, the re.search would itself raise a TypeError because you're only passing it one argument rather than two.

This is exactly why you shouldn't write overly-complicated expressions. They're very painful to debug. If you'd written this in separate steps, it would be obvious:

escaped_kand = map(re.escape, kand)
alternation = '|'.join(escaped_kand)
whatever_this_was_supposed_to_do = ''.format(alternation)
regexpr = re.compile(whatever_this_was_supposed_to_do, corpus.raw(fileid))
re.search(regexpr)

This would also make it obvious that half the work you're doing isn't needed in the first place.

First, re.search takes a pattern, not a compiled regexpr. If it happens to work with a compiled regexpr, that's just an accident. So, that whole part of the expression is useless. Just pass the pattern itself.

Or, if you have a good reason to compile the regexpr, as re.compile explains, the result regular expression object "can be used for matching using its match() and search() methods". So use the compiled object's search method, not the top-level re.search function.

Second, I don't know what you expected ''.format(anything) to do, but it can't possibly return anything but ''.

Sign up to request clarification or add additional context in comments.

Comments

1

You're mixing old and new string formatting rules. Also, you need to use raw strings with a regex, or \b will mean backspace, not word boundary.

'\b(?:%s)\b'.format('|'.join(map(re.escape, kand)))

should be

r'\b(?:{0})\b'.format('|'.join(map(re.escape, kand)))

Furthermore, be aware that \b only works if your "words" start and end with alphanumeric characters (or _).

2 Comments

experimented now and realized that this works, but not sure it right or pythonic: re.search(re.compile('\b(?:%s)\b' + '|'.join(map(re.escape, kandidats_all))), corpus.raw(fileid))
why not... re.compile('\b(?:%s)\b' + '|'.join(map(re.escape, kandidats_all))).search(kandidats_all), then you're calling search from the compiled regex object

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.