1

I'm trying to find a regular expression that would allow me replace the SRC attribute in an image. Here is what I have:

function getURL($matches) {
  global $rootURL;
  return $rootURL . "?type=image&URL=" . base64_encode($matches['1']);
}

$contents = preg_replace_callback("/<img[^>]*src *= *[\"']?([^\"']*)/i", getURL, $contents);

For the most part, this works well, except that anything before the src=" attribute is eliminated when $contents is echoed to the screen. In the end, SRC is updated properly and all of the attributes after the updated image URL are returned to the screen.

I am not interested in using a DOM or XML parsing library, since this is such a small application.

How can I fix the regex so that only the value for SRC is updated?

Thank you for your time!

4 Answers 4

2

Use a lazy star instead of a greedy one.

This may be your problem:

/<img[^>]*src *= *[\"']?([^\"']*)/
         ^

Change it to:

/<img[^>]*?src *= *[\"']?([^\"']*)/

This way, the [^>]* matches the smallest possible number of your bracket expression, rather than the largest possible.

Sign up to request clarification or add additional context in comments.

Comments

1

Do another grouping and prepend it to the return value?

function getURL($matches) {
  global $rootURL;
  return $matches[1] . $rootURL . "?type=image&URL=" . base64_encode($matches['2']);
}

$contents = preg_replace_callback("/(<img[^>]*src *= *[\"']?)([^\"']*)/i", getURL, $contents);

Comments

0

I am not interested in using a DOM or XML parsing library, since this is such a small application.

Nevertheless, that is the correct approach regardless of your application size.

Remember, when you modify elements with DOMDocument, you should iterate in reverse to avoid unexpected oddities - in particular if you remove anything.

Here's a working example using DOMDocument. It's more complicated than a regex, but not terribly difficult and a lot more flexible and robust for any other tweaking the may be required.

function inner_html($node) {
    $innerHTML = "";
    foreach ($node->childNodes as $child) {
        $innerHTML .= $node->ownerDocument->saveHTML($child);
    }
    return $innerHTML;
}
function replace_src($html) {
    $rootURL = 'https://example.com';
    $dom = new DOMDocument();
    if (mb_detect_encoding($html, 'UTF-8', true) == 'UTF-8') {
        $html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
    }
    $dom->loadHTML('<body>' . $html . '</body>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    for ($els = $dom->getElementsByTagname('img'), $i = $els->length - 1; $i >= 0; $i--) {
        $src = $els->item($i)->getAttribute('src');
        $els->item($i)->setAttribute('src', $rootURL . '?type=image&URL=' . $src);
    }
    return inner_html($dom->documentElement);
}

$html = '
    <div>
        <img src="test123">
        <img src="test456">
    </div>
';

echo replace_src($html);

OUTPUT:

<div>
    <img src="https://example.com?type=image&amp;URL=test123">
    <img src="https://example.com?type=image&amp;URL=test456">
</div>

Comments

0

You can check for spaces too
Use this:

/<\s*img[^>]*?src\s*=\s*(["'])([^"']+)\1[^>]*?>/giu

https://regex101.com/r/jmMoio/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.