5

Can someone tell me why this regex works fine on oneline regex websites but not while using re.compile() in python.

I have used this website: https://regex101.com/ and tested string is:

"test": "value"

Python code

x = r'((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^")]))+")|(?:\w+)))(:|~)\s+((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^"]))+")|(?:\w+)))'
re.compile(x)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character in group name
9
  • 1
    If you switch regex101 to Python mode it will also tell you it's broken. Commented Mar 25, 2017 at 12:31
  • 1
    I would be thankful if you provide an explanation Commented Mar 25, 2017 at 12:55
  • 1
    you can use the regex library instead of default one used by python..you can download it from here Commented Mar 25, 2017 at 13:59
  • 2
    Just use import regex as re (pip install regex before) and you'll be good to go. Commented Mar 25, 2017 at 14:23
  • 3
    Thanks, anyone knows why is it not implemented in python's re module Commented Mar 25, 2017 at 14:40

3 Answers 3

2

From your example string and the regex101 output, it looks like you are trying to match a Python string with the general form:

"word": "word"

That is to say, a groups 1 and 3 are words that can either be in double quotes, or not quoted, but no hanging quotes, group 2 is a colon or tilde and can be followed by a whitespace character. So:

goodString = "\"test\": value"
badString = "test\": value"

The problem with your regex compile string actually hints towards the solution! This question sheds light on the returned error and the Python documentation gives information on named groups.

By using named groups, you can make your expression shorter and more Pythonic!

x = r'((?P<a>\"?)\w+(?P=a))(:|~)\s+((?P<b>\"?)\w+(?P=b))'

For clarity:

group 1 = ((?P<a>\"?)\w+(?P=a))
group 2 = (:|~)\s+
group 3 = ((?P<b>\"?)\w+(?P=b))

Groups 1 and 3 capture the presence or absence of the quotation mark in a subgroup (a and b, respectively), then check for that subgroup at the end of the word.

You do not need to name the groups either! You could simply reference their number:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'

As a final test:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'
goodString = "\"test\": value"
badString = "test\": value"
print(re.match(x,goodString))
print(re.match(x,badString))

Output:

<_sre.SRE_Match object; span=(0, 13), match='"test": value'>
None
Sign up to request clarification or add additional context in comments.

Comments

0

Current versions of Python give a more useful error message:

>>> x = r'((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^")]))+")|(?:\w+)))(:|~)\s+((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^"]))+")|(?:\w+)))'
>>> re.compile(x)
...
error: bad character in group name '?="' at position 4

This error message is trying to communicate that stdlib re module has a restriction on group names: they must be valid identifiers.

If you're getting this error "bad character in group name", check that your named groups are valid Python identifiers. For example "foo2" is a valid identifier so this pattern compiles:

>>> "foo2".isidentifier()
True
>>> re.compile(r"(?P<foo2>)(?P=foo2)")
re.compile(r'(?P<foo2>)(?P=foo2)', re.UNICODE)

But "2foo" is not a valid identifier and will cause a similar error message:

>>> "2foo".isidentifier()
False
>>> re.compile(r"(?P<2foo>)(?P=2foo)")
...
error: bad character in group name '2foo' at position 4

Comments

-2

If you want abilities beyond standard re, try this one: https://bitbucket.org/mrabarnett/mrab-regex

It is a drop-in replacement of re, but supports many more new features, including conditional pattern.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.