I feel bad for asking yet another regex question, but this has been driving me crazy for the past week.
I am trying to use regular expressions in python to replace some text that looks like this:
text = """some stuff
line with text
other stuff
[code language='cpp']
#include <cstdio>
int main() {
printf("Hello");
}
[/code]
Maybe some
other text"""
What I want to do is capture the text inside the [code] tags, add a tab (\t) in front of each line and then replace all the [code]...[/code] by this new lines with the tabs prepended. That is, I want the result to look like:
"""some stuff
line with text
other stuff
#include <cstdio>
int main() {
printf("Hello");
}
Maybe some
other text"""
I am using the following snippet.
class CodeParser(object):
"""Parse a blog post and turn it into markdown."""
def __init__(self):
self.regex = re.compile('.*\[code.*?\](?P<code>.*)\[/code\].*',
re.DOTALL)
def parse_code(self, text):
"""Parses code section from a wp post into markdown."""
code = self.regex.match(text).group('code')
code = ['\t%s' % s for s in code.split('\n')]
code = '\n'.join(code)
return self.regex.sub('\n%s\n' % code, text)
The problem with this is that it matches all the characters before and after the code tags because of the initial and final .* and when I perform the replacement, these are removed. If I remove the .*s, the re does not match anything anymore.
I thought this could be a problem with newlines, so I tried replacing all the '\n' with, say, '¬', performing the matching, and then changing the '¬' back to '\n', but I didn't have any luck with this approach.
If anyone has a better method of accomplishing what I want to accomplish, I am open to suggestions.
Thank you.