0

Imagine I have a multiline string, which contains tokens of the format {{ string_a }}, and can be placed either on their own line with possible leading whitespace, or on the same line as some other markup.

a: {{ string_a }}

b:
  {{ string_b }}

I am trying to write a regex which will match the contents of these tokens and do a replacement, but I would also like to run some conditional logic based on whether or not they exist on their own line.

My original regex is pretty basic: \{\{\s*([A-Za-z_]+)\s*\}\} and it does a fine job of matching the tokens. However, when I try and match the leading whitespace and put that into a capturing group, it only matches those with whitespace, when really I want all the tokens regardless:

(^\s*)\{\{\s*([A-Za-z_]+)\s*\}\}

Broken regex

I imagine that the solution is to use some sort of lookahead/lookbehind, but whichever I try, it seems to break. Not sure if this is because of the ^ or the * in that first group, but it doesn't like it either way.

So, what I'm trying to get as my capture list is the following:

  • The full token {{ string_a }}
  • The inner string string_a
  • Whitespace or not \s\s\s, to do boolean conditional logic on
6
  • Python tag because it's pythonic regex syntax, but could remove if it seems unnecessary Commented Dec 11, 2017 at 8:54
  • That's fine, actually we basically require all regex questions to also specify the language or platform, ideally with a suitable tag. Commented Dec 11, 2017 at 8:56
  • 1
    if its an optional space at the start of the string could you not start your regex with (\s)? Commented Dec 11, 2017 at 8:59
  • This looks suspiciously like you are attempting to parse YAML or some other structured format with regular expressions. Don't do that. Google for cthulhu zalgo html. Commented Dec 11, 2017 at 8:59
  • 1
    And yes, I've been on SO for over 6 years, I've seen the cthulhu post ;) Commented Dec 11, 2017 at 9:01

1 Answer 1

1

Try (^\s*)?\{\{\s*([A-Za-z_]+)\s*\}\}. The only difference is a single ?.

enter image description here

The problem with (^\s*) is that the ^ is not optional.

Sign up to request clarification or add additional context in comments.

5 Comments

Ah that's so very simple and should've been obvious! Pre-9am pre-coffee Monday brain to blame. Was trying to overcomplicate with lookbehinds. Thanks
By the way, escaping { and } is only necessary when it would be ambiguous to the parser. (^\s*)?{{\s*([A-Za-z_]+)\s*}} works just as well.
Yeah I just do it as a matter of habit really. Quick extra question, it seems my re.finditer is not actually matching the whitespace group. Always comes out as None. Is there a flag I'm missing? Cheers
m flag for all I can tell.
D'oh. Appreciate it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.