1

I have the following parsing scenario in python, there is cases of lines:

  1. {{ name xxxxxxCONTENTxxxxx /}}
  2. {{ name }} xxxxxxxCONTENTxxxxxxx {{ name /}}
  3. {{ name xxxxxxCONTENTxxx {comand} xxxxCONTENTxxx /}}

All I need to do is classify to which case the given line belongs using regex.

I can successfully classify between 1) and 2) but having trouble to deal with 3).

to catch 1) I use:

re.match('\s*{{[^{]*?/}}\s*',line)

to catch 2) I use:

re.match('{{.*?}}',line)

and then raise a flag to keep the context since case 2) can be over multiple lines. How can I catch case 3) ??

The condition which I'm currently trying to match is to test for:

- start with '{{'
- end with '/}}'
- with no '{{' in between

However I'm having a hard time phrasing this in regex.

12
  • 2
    '^{{((?!{{).)*/}}$' - See Regular expression to match line that doesn't contain a word? Commented Apr 5, 2016 at 8:15
  • This works well in js but having trouble with it in python. In js it catches condition 1 and 2 which is good, but in python it gives no match. Commented Apr 5, 2016 at 8:33
  • Using pythex.org (great site btw :) ) I get that the regex matches 1 and 3, but not 2 - because it has '{{' in it. Could you post your code that didn't work? Commented Apr 5, 2016 at 8:45
  • Maybe {{(?:(?!{{).)*/}}? (maybe re.DOTALL is necessary if it spans across multiple lines) Commented Apr 5, 2016 at 8:55
  • Could you please narrow your question to what you exactly need to match and what not to match? Do you want to match {{ name xxxxxxCONTENTxxxxx /}} and {{ name xxxxxxCONTENTxxx {comand} xxxxCONTENTxxx /}} as entire strings, and not match at all {{ name }} xxxxxxxCONTENTxxxxxxx {{ name /}}? Commented Apr 5, 2016 at 9:00

1 Answer 1

1

The conditions:

- start with '{{'
- end with '/}}'
- with no '{{' in between

are a perfect fit for a tempered greedy token.

^{{(?:(?!{{|/}}).)*/}}$
   ^^^^^^^^^^^^^^^^

See regex demo.

The (?:(?!{{|/}}).)* matches any text that is not {{ and /}} (thus matches up to the first /}}). Anchors (^ and $) allow to only match a whole string that starts with {{ and ends with /}} and has no {{ inside. Note that with re.match, you do not neet ^ anchor.

Now, to only match the 3rd type of strings, you need to specify that your pattern should have {....}:

^{{(?:(?!{{|/}}).)*{[^{}]*}(?:(?!{{|/}}).)*/}}$
   | ----  1 -----|| - 2 -||--------1-----|

See another regex demo

Part 1 is the tempered greedy token described above and {[^{}]*} matches a single {...} substring making it compulsory inside the input.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.