Python regex error: bad character in group name

Question

Can someone tell me why this regex works fine on oneline regex websites but not while using re.compile() in python.

I have used this website: https://regex101.com/ and tested string is:

"test": "value"

Python code

x = r'((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^")]))+")|(?:\w+)))(:|~)\s+((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^"]))+")|(?:\w+)))'
re.compile(x)

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character in group name

If you switch regex101 to Python mode it will also tell you it's broken. — jonrsharpe
– jonrsharpe, Commented Mar 25, 2017 at 12:31
you can use the regex library instead of default one used by python..you can download it from here — rock321987
– rock321987, Commented Mar 25, 2017 at 13:59
Just use import regex as re (pip install regex before) and you'll be good to go. — Jan
– Jan, Commented Mar 25, 2017 at 14:23
Thanks, anyone knows why is it not implemented in python's re module — Abhishek Agarwal
– Abhishek Agarwal, Commented Mar 25, 2017 at 14:40

PeanutButterVibes · Accepted Answer · 2018-10-23 20:30:31Z

From your example string and the regex101 output, it looks like you are trying to match a Python string with the general form:

"word": "word"

That is to say, a groups 1 and 3 are words that can either be in double quotes, or not quoted, but no hanging quotes, group 2 is a colon or tilde and can be followed by a whitespace character. So:

goodString = "\"test\": value"
badString = "test\": value"

The problem with your regex compile string actually hints towards the solution! This question sheds light on the returned error and the Python documentation gives information on named groups.

By using named groups, you can make your expression shorter and more Pythonic!

x = r'((?P<a>\"?)\w+(?P=a))(:|~)\s+((?P<b>\"?)\w+(?P=b))'

For clarity:

group 1 = ((?P<a>\"?)\w+(?P=a))
group 2 = (:|~)\s+
group 3 = ((?P<b>\"?)\w+(?P=b))

Groups 1 and 3 capture the presence or absence of the quotation mark in a subgroup (a and b, respectively), then check for that subgroup at the end of the word.

You do not need to name the groups either! You could simply reference their number:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'

As a final test:

x = r'((\"?)\w+(\2))(:|~)\s+((\"?)\w+(\6))'
goodString = "\"test\": value"
badString = "test\": value"
print(re.match(x,goodString))
print(re.match(x,badString))

Output:

<_sre.SRE_Match object; span=(0, 13), match='"test": value'>
None

wim · Accepted Answer · 2024-06-11 17:46:49Z

Current versions of Python give a more useful error message:

>>> x = r'((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^")]))+")|(?:\w+)))(:|~)\s+((?(?=")(?:"(?(?<=\\)(?:.)|(?:[^"]))+")|(?:\w+)))'
>>> re.compile(x)
...
error: bad character in group name '?="' at position 4

This error message is trying to communicate that stdlib re module has a restriction on group names: they must be valid identifiers.

If you're getting this error "bad character in group name", check that your named groups are valid Python identifiers. For example "foo2" is a valid identifier so this pattern compiles:

>>> "foo2".isidentifier()
True
>>> re.compile(r"(?P<foo2>)(?P=foo2)")
re.compile(r'(?P<foo2>)(?P=foo2)', re.UNICODE)

But "2foo" is not a valid identifier and will cause a similar error message:

>>> "2foo".isidentifier()
False
>>> re.compile(r"(?P<2foo>)(?P=2foo)")
...
error: bad character in group name '2foo' at position 4

Chuancong Gao · Accepted Answer · 2017-03-25 18:31:33Z

-2

If you want abilities beyond standard re, try this one: https://bitbucket.org/mrabarnett/mrab-regex

It is a drop-in replacement of re, but supports many more new features, including conditional pattern.

answered Mar 25, 2017 at 18:31

Chuancong Gao

6705 silver badges7 bronze badges

Collectives™ on Stack Overflow

Python regex error: bad character in group name

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related