1

I have a tricky string of HTML code that includes several pre tags that inside them include code (say, python), and that are also decorated by HTML tags that should be removed.

For example:

Some text.
<pre>
a = 5 <br/>
b = 3
</pre>
More text
<pre>
a2 = "<a href='something'>text</a>"
b = 3
</pre>
final text

I would like to clean out all the HTML tags (these are likely to be basic tags, br, em, div, a, etc.). I do not need to parse the HTML, I know that regex cannot parse html.

Some text.
<pre>
a = 5
b = 3
</pre>
More text
<pre>
a2 = "text"
b = 3
</pre>
final text

I'd like to do this using PHP (with something like preg_replace). For example:

$html = "<html><head></head><body><div><pre class=\"some-css-class\">
         <p><strong>
         some_code = 1
         </p></strong>
         </pre></div></body>"; // Compacting things here, for brevity

$newHTML = preg_replace("/(.*?)<pre[^<>]*>(.*?)<\/pre>(.*)/Us", "$1".strip_tags("$2", '<p><a><strong>')."$3", $html);
echo $newHTML;

This example code obviously doesn't since: (1) it would work for only one pre tag, and (2) the code strip_tags("$2", '<p><a><strong>') would obviously not work, since it doesn't do the processing of the string in the right location (it would just return "$2" instead of getting the text and manipulating it properly).

Any suggestions on how this could be done in PHP? Thanks.

1 Answer 1

3

You will need to use preg_replace_callback and call strip_tags in callback body:

preg_replace_callback('~(<pre[^>]*>)([\s\S]*?)(</pre>)~',
function ($m) { return $m[1] . strip_tags($m[2], ['p', 'b', 'strong']) . $m[3]; },
$s);
Some text.
<pre>
a = 5
b = 3
</pre>
More text
<pre>
a2 = "text"
b = 3
</pre>
final text

Note that above strip_tags strips all tags except p, b and strong.

RegEx Details:

  • (<pre[^>]*>): Match <pre...> and capture in group #1
  • ([\s\S]*?): Match 0 or or more of any character including newline (lazy), capture this in group $2. [\s\S] matches any character including newline.
  • (</pre>): Match </pre> and capture in group #3
Sign up to request clarification or add additional context in comments.

2 Comments

Amazing! Could you please help explain the regex in the code?
Sure, let me add it in answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.