3

I want to replace some "markdown" tags into html tags.

for example:

#Title1#
##title2##
Welcome to **My Home Page**

will be turned into

<h1>Title1</h1>
<h2>title2</h2>
Welcome to <b>My Home Page</b>

I just don't know how to do that...For Title1,I tried this:

#!/usr/bin/env python3
import re
text = '''
        #Title1#
        ##title2##
'''
 p = re.compile('^#\w*#\n$')
 print(p.sub('<h1>\w*</h1>',text))

but nothing happens..

 #Title1#
 ##title2##

How could those bbcode/markdown language come into html tags?

6
  • 7
    You should use a markdown parser Commented Oct 5, 2015 at 7:58
  • 2
    Look for some Markdown parser. A search for pypi markdown parser gives several results. I don't have any experience with them, so I think you should download them and try them out on some Markdown formatted text. Commented Oct 5, 2015 at 7:59
  • 1
    Thanks, But I want to know how those markdown languages works and I am willing to write my own style markdown standards for my homepage with python3 cgi program. Commented Oct 5, 2015 at 8:59
  • for this reason I never solve the problem with markdown packages.. Commented Oct 5, 2015 at 9:00
  • @BingSun: The actual parsing algorithm is described in CommonMark specs in details, if I remember correctly - it's a two-pass algorithm - first pass to identify block constructs, and 2nd pass to parser the rest. If you want to learn to write a parser, the best way is to look at how existing parsers are written. Commented Oct 5, 2015 at 9:13

2 Answers 2

4

Check this regex: demo

Here you can see how I substituted the #...# into <h1>...</h1>. I believe you can get this to work with double # and so on to get other markdown features considered, but still you should listen to @Thomas and @nhahtdh comments and use a markdown parser. Using regexes in such cases is unreliable, slow and unsafe.

As for inline text like **...** to <b>...</b> you can try this regex with substitution: demo. Hope you can twink this for other features like underlining and so on.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, I will try markdown parser later. I tried to reset text='#title#' and print(p.sub('<h1>\1</h1>',text)). the program returns <h1></h1>. what does \1 means? how to define contents that never need to be modified?
\1 is a backreference to capture group 1. You will need to check how to do this in Python, as I am not familiar with this language.
@BingSun I added regex for **...** part, you might want to check this out too.
Thank you very much! It is very kind of you!
1

Your regular expression does not work because in the default mode, ^ and $ (respectively) matches the beginning and the end of the whole string.

'^'

(Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline (my emph.)

'$'

Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

(7.2.1. Regular Expression Syntax)

Add the flag re.MULTILINE in your compile line:

p = re.compile('^#(\w*)#\n$', re.MULTILINE)

and it should work – at least for single words, such as your example. A better check would be

p = re.compile('^#([^#]*)#\n$', re.MULTILINE)

– any sequence that does not contain a #.

In both expressions, you need to add parentheses around the part you want to copy so you can use that text in your replacement code. See the official documentation on Grouping for that.

1 Comment

When you mention "single line mode", it's going to be confused with s flag which makes . matches new line. (Well, the naming is quite confusing, and I was bitten by it when I started out). It's more accurate to say, by default ^ and $ matches the beginning and the end of the whole string. You need MULTILINE mode (m flag) to make them also match the beginning and the end of the line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.