1

I have the following "sample" content:

{% block some_name %}Some Text{% endblock %} 
Something Else
{% block another_name %}Some Other Content{% endblock %}

And I am trying to regex to find both blocks, first the names, then after that the sections, but am only receiving the first back from my "findall" action:

re.findall(r"\{% block ([^\{%]+?) %\}[\s\S]*\{% endblock %\}", contents)

** assuming variable "contents" is the string at the top.

So i need two searches, or combined if possible, returning me something alike:

list[
    ['some_name', 'another_name'],
    ['{% block some_name %}Some Text{% endblock %}', '{% block another_name %}Some Other Content{% endblock %}']
]
3

1 Answer 1

2

You may use

r'(\{%\s*block\s+(.+?)\s*%}[\s\S]*?\{%\s*endblock\s*(?:\2\s*)?%})'

See the regex demo.

Details

  • ( - start of an outer capturing group #1 (as as to get all the match into the list of tuples returned by re.findall):
    • `{%
    • \s* - 0+ whitespaces
    • block - a block substring
    • \s+ - 1+ whitespaces
    • (.+?) - 1+ chars other than line break chars (replace with [\s\S] to also match line breaks), as few as possible, captured into Group 2
    • \s* - 0+ whitespaces
    • %} - % substring
    • [\s\S]*? - any 0+ chars as few as possible
    • \{% - a {% substring
    • \s* - 0+ whitespaces
    • endblock - a literal substring
    • \s* - 0+ whitespaces
    • (?:\2\s*)? - an optional sequence of the Group 2 value and 0+ whitespaces after
    • %} - a %} substring
  • ) - end of the outer capturing group #1.

See the Python demo:

import re
rx = r'(\{%\s*block\s+(.+?)\s*%}[\s\S]*?\{%\s*endblock\s*(?:\2\s*)?%})'
s = '{% block some_name %}Some Text{% endblock %} \nSomething Else\n{% block another_name %}Some Other Content{% endblock %}'
print(list(map(list, zip(*re.findall(rx, s))))) # Extracting and transposing the list
# => [['{% block some_name %}Some Text{% endblock %}', '{% block another_name %}Some Other Content{% endblock %}'], ['some_name', 'another_name']]
Sign up to request clarification or add additional context in comments.

2 Comments

If this is for parsing django / jinja templates, you should probably make sure that the endblock tag can optionally also have a name. {% endblock some_name %} is valid. And there can be arbitrary whitespace (including newlines) inside the {% %}.
i appreciate that, while i also work in those, this is not it, but still very useful! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.