0

How I can extract {{template|{{template2}}|other params}} from this string if we just know "template":

{{template0}}
{{template|{{template2}}|other params}}
{{template3}}
1
  • 1
    It looks like you are parsing mediawiki syntax, are you aware there is exist a python module to do that much better than any regex can: pypi.python.org/pypi/mwlib/0.13.1 Commented Dec 16, 2011 at 21:28

1 Answer 1

2

This should do what you want:

>>> match = re.search(r'^{{template\b.*$', your_string, re.M)
>>> match.group()
'{{template|{{template2}}|other params}}'

It uses a word boundary (\b) after 'template' so it will not match 'template0' or 'template3'. The re.M option is used so ^ and $ will match the beginnings and ends of lines, instead of the beginning and end of the string.

Edit: Try the following regex for the newline case from your comment:

r'^{{template\b(?:[^}]\n+|\n+[^{]|.)*$'

This should work whether you put the newline before or after the |.

Edit 2: It is very important with regex questions that you specify what the input can look like up front. Here is another version that works with the text from your latest comment:

r'^{{template\b(?:[^}\n]\n+|\n+[^{\n]|.)*}}$'

Now it will handle multiple newlines correctly, and I added the }} at the end in case your match is the last bracketed group before lines with other formats.

Sign up to request clarification or add additional context in comments.

8 Comments

If there is a \n in it It won't match whole it, for example if it was "{{template|{{template2}}\n|other params}}" it will match "{{template|{{template2}}"
Then regex isn't the right tool for this job, at least not in Python where recursive regexes are not supported.
See my edit, it will continue to match on newlines as long as the current line doesn't end with }} or the next line doesn't start with {{. I think this should work depending on the format of your string.
@Mjbmr - You have to know what your stopping pattern is going to be if you want to also allow newlines. Otherwise, like Tim said, its not the right tool. You will need to instead scan the string for tokens like a parser, and changing states such as when a new {{ is started or stopped until you reach your root level again.
@F.J Newlines could be anywhere of it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.