Match regex string, but also return optional leading whitespace

Question

Imagine I have a multiline string, which contains tokens of the format {{ string_a }}, and can be placed either on their own line with possible leading whitespace, or on the same line as some other markup.

a: {{ string_a }}

b:
  {{ string_b }}

I am trying to write a regex which will match the contents of these tokens and do a replacement, but I would also like to run some conditional logic based on whether or not they exist on their own line.

My original regex is pretty basic: \{\{\s*([A-Za-z_]+)\s*\}\} and it does a fine job of matching the tokens. However, when I try and match the leading whitespace and put that into a capturing group, it only matches those with whitespace, when really I want all the tokens regardless:

(^\s*)\{\{\s*([A-Za-z_]+)\s*\}\}

I imagine that the solution is to use some sort of lookahead/lookbehind, but whichever I try, it seems to break. Not sure if this is because of the ^ or the * in that first group, but it doesn't like it either way.

So, what I'm trying to get as my capture list is the following:

The full token {{ string_a }}
The inner string string_a
Whitespace or not \s\s\s, to do boolean conditional logic on

Python tag because it's pythonic regex syntax, but could remove if it seems unnecessary — Matt Fletcher
– Matt Fletcher, Commented Dec 11, 2017 at 8:54
That's fine, actually we basically require all regex questions to also specify the language or platform, ideally with a suitable tag. — tripleee
– tripleee, Commented Dec 11, 2017 at 8:56
if its an optional space at the start of the string could you not start your regex with (\s)? — WhatsThePoint
– WhatsThePoint, Commented Dec 11, 2017 at 8:59
This looks suspiciously like you are attempting to parse YAML or some other structured format with regular expressions. Don't do that. Google for cthulhu zalgo html. — tripleee
– tripleee, Commented Dec 11, 2017 at 8:59
And yes, I've been on SO for over 6 years, I've seen the cthulhu post ;) — Matt Fletcher
– Matt Fletcher, Commented Dec 11, 2017 at 9:01

Tomalak · Accepted Answer · 2017-12-11 08:58:54Z

1

Try (^\s*)?\{\{\s*([A-Za-z_]+)\s*\}\}. The only difference is a single ?.

The problem with (^\s*) is that the ^ is not optional.

answered Dec 11, 2017 at 8:58

Tomalak

339k68 gold badges547 silver badges635 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Matt Fletcher Over a year ago

Ah that's so very simple and should've been obvious! Pre-9am pre-coffee Monday brain to blame. Was trying to overcomplicate with lookbehinds. Thanks

Tomalak Over a year ago

By the way, escaping { and } is only necessary when it would be ambiguous to the parser. (^\s*)?{{\s*([A-Za-z_]+)\s*}} works just as well.

Matt Fletcher Over a year ago

Yeah I just do it as a matter of habit really. Quick extra question, it seems my re.finditer is not actually matching the whitespace group. Always comes out as None. Is there a flag I'm missing? Cheers

Tomalak Over a year ago

m flag for all I can tell.

Matt Fletcher Over a year ago

D'oh. Appreciate it

Collectives™ on Stack Overflow

Match regex string, but also return optional leading whitespace

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related