1

I'm having trouble getting set operators to work in the regex module (regex 2013-11-29) in python-3.x. For example, to match ASCII characters minus punctuation I have tried:

import regex as rx

data = '(foo)'
for m in rx.finditer(r'[\p{ASCII}--\p{P}]+',data):
    print(m.group(0))     # expect 'foo', getting '(foo)'

The documentation gives this example:

[\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'

Am I missing something here?

0

1 Answer 1

2

It sounds like you need to explicitly opt into Version 1 behavior so that the -- is interpreted as a set operator and not as characters to include in the class.

From the module web page:

Version 1 behaviour (new behaviour, different from the current re module):

Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.

  • .split will split a string at a zero-width match.

  • Inline flags apply to the end of the group or pattern, and they can be turned off.

  • Nested sets and set operations are supported.

  • Case-insensitive matches in Unicode use full case-folding by default.

  • If no version is specified, the regex module will default to regex.DEFAULT_VERSION. In the short term this will be VERSION0, but in the longer term it will be VERSION1.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.