0

I would like to make a regular expression for formatting a text, in which there can't be a { character except if it's coming with a backslash \ behind. The problem is that a backslash can escape itself, so I don't want to match \\{ for example, but I do want \\\{. So I want only an odd number of backslashs before a {. I can't just take it in a group and lookup the number of backslashs there are after like this:

s = r"a wei\\\{rd thing\\\\\{"
matchs = re.finditer(r"([^\{]|(\\+)\{)+", s)
for match in matchs:
    if len(match.group(2)) / 2 == len(match.group(2)) // 2: # check if it's even
        continue
    do_some_things()

Because the group 2 can be used more than one time, so I can access only to the last one (in this case, \\\\\) It would be really nice if we could just do something like "([^\{]|(\\+)(?if len(\2) / 2 == len(\2) // 2)\{)+" as regular expression, but, as far as I know, that is impossible. How can I do then ???

2
  • counting number of backslashes is a NP-hard problem.. Commented May 9, 2020 at 13:16
  • Beside the point, but to check if something's even, use modulo-2: 0 % 2 == 0, 1 % 2 == 1, 2 % 2 == 0, etc Commented May 9, 2020 at 18:56

2 Answers 2

1

This matches an odd number of backslashes followed by a brace:

(?<!\\)(\\\\)*(\\\{)

Breakdown:

  • (?<!\\) - Not preceded by a backslash, to accommodate the next bit
    • This is called "negative lookbehind"
  • (\\\\)* - Zero or more pairs of backslashes
  • (\\\{) - A backslash then a brace

Matches:

\{
\\\{
\\\\\{

Non-matches:

\\{
\\\\{
\\\\\\{

Try it on RegExr


This was partly inspired by Vadim Baratashvili's answer

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, but is there a way to not match the backslashs ? I would like to put everything in a lookbehind but Python don't want a variable-lenght text in a lookbehind. Also, this is a little bit dumb, because Python could just reverse the expression in the lookbehind ((abc)+ would become (cba)+) and do as always but backwards...
@Bananasmoothii Yes, just use a non-matching group, (?:\\\\)*
I know that syntax, but I have a regex that looks like like \{([^\{%]+)(%[^\{%]+)?(?<!\\)(?:\\\\)*\} and I would like the first group to be blue and the second one to be green, but in "{a%b\\\\}", the backslashs stay black because they are either in the first either in the second group...
0

I think you can use this as solution: ([^\\](\\\\){0,})(\{)

We can check that between the last character that is not a backslash there are 0 or more pairs of backslashes and then goes {if part of the text matches the pattern, then we can replace it with the first group $1 (a character that is not a slash plus 0 or more pairs of slashes), so we will find and replace not escaped { .

If we want to find escaped { we ca use this expression: ([^\\](\\\\){0,})(\\\{) - second group of match is \{

6 Comments

Awesome. I just rearranged it to make it clearer. I can't get it to work though. It doesn't find any matches in the example.
@wjandrea if you add { with paired backslashes or without backslashes to the text, { will be replaced by an empty string and finded if we do not need to replace anything (then we have to process it as we need)
I think that's the opposite of what OP wants, but the question is not totally clear. I posted my own answer partly inspired by yours though, thanks!
@wjandrea or you can use ([^\\](\\\\){0,})(\\\{) to find \{ \\\{ \\\\\{
Why do you use {0,} instead of * ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.