0

I'm trying to convert URLs, but not if they come after src=". So far, I have this...

return preg_replace('@(?!^src=")(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);

It converts the URL, but even if it is before src=".

2 Answers 2

2

Make that a lookbehind assertion.

(?<!^src=")
Sign up to request clarification or add additional context in comments.

Comments

0

I must infer the intent of this task in the absence of a minimal verifiable example.

By leveraging a legitimate DOM parser, you can largely prevent the matching of non-text nodes which contain otherwise qualifying URL values.

Below uses an XPath query to prevent matching the URL value which is already the child of an <a> tag. By only targeting text(), there is no chance of replacing tag attribute values.

What comes next is some of the clever magic while looping over the text nodes.

Use preg_match_all() to isolate one or more nodes URLs in each text node, then create a new <a> element to replace the respective URL segment of text.

Use splitText() to "spit out" the leading portion of text before the URL -- it will become a new node prior to the current node.

Use replace_child() to replace the remaining text with the new <a> node.

Use insertBefore() to prepend the text that originally followed the URL text as a new text node.

Code: (Demo)

$html = <<<HTML
<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link http://example.com/number2 then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another HTTPS://www.example.net/booyah</p> and done
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$regex = '#\bhttps?://[-\w.]+(?::\d+)?(?:/(?:[\w/_.-]*(?:\?\S+)?)?)?#ui';
foreach ($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $text = $textNode->nodeValue;
    foreach (preg_match_all($regex, $text, $m) ? $m[0] : [] as $url) {
        $a = $dom->createElement('a', htmlspecialchars($url));
        $a->setAttribute('href', $url);
        $mbPosOfUrlInText = mb_strpos($text, $url);
        // regurgitate any leading text as a new preceding node
        // then replace remainder of text with new hyperlink
        $textNode->parentNode->replaceChild(
            $a,
            $textNode->splitText($mbPosOfUrlInText)
        );
        // add any text after url as new text node after new hyperlink
        $textNode->parentNode->insertBefore(
            $dom->createTextNode(
                mb_substr($text, $mbPosOfUrlInText + mb_strlen($url))
            ),
            $a->nextSibling
        );
    }
}
echo $dom->saveHTML();

Output:

<div>
Some text <img src="https://example.com/foo?bar=food"> and
a raw link <a href="http://example.com/number2">http://example.com/number2</a> then
<a href="https://example.com/the/third/url">original text</a> ...
and <p>here's another <a href="HTTPS://www.example.net/booyah">HTTPS://www.example.net/booyah</a></p> and done
</div>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.