In A Liberal, Accurate Regex Pattern for Matching URLs I found the following Regex
\b(([\w-]+://?|www[.])[^\s()<>]+(?:([\w\d]+)|([^[:punct:]\s]|/)))
Solution
/**
* @param string $str the string to encode and parse for URLs
*/
function preventXssAndParseAnchors(string $str): string
{
$url_regex = "/\b((https?:\/\/?|www\.)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|\/)))/";
// Encoding HTML special characters To prevent XSS
// Before parsing the URLs to Anchors
$str = htmlspecialchars($str, ENT_QUOTES, 'UTF-8');
preg_match_all($url_regex, $str, $urls);
foreach ($urls[0] as $url) {
$str = str_replace($url, "<a href='$url'>$url</a>", $str);
}
return $str;
}
Example
<?php
$str = "
apply here https://ph.dbsd.com/job/dfvdfg/5444
<script> console.log('this is a hacking attempt hacking'); </script>
and www.google.com
also http://somesite.net
";
echo preventXssAndParseAnchors($str);
The output
apply here <a href='https://ph.dbsd.com/job/dfvdfg/5444'>https://ph.dbsd.com/job/dfvdfg/5444</a>
<script> console.log('this is a hacking attempt hacking'); </script>
and <a href='www.google.com'>www.google.com</a>
also <a href='http://somesite.net'>http://somesite.net</a>
Test https://3v4l.org/85lsl