0

I'm trying to clean some pages from my blog and modify the images tags by using preg_replace. When an image has been cleaned, I add the data attribute data-updated to avoid modifying them a second time.

$final =  preg_replace('/<img(.*?)>/', '<img$1 data-updated=\'1\'>', $final);

But the next time I run the cleaning, the data-updated attribute is added a second time. I could do a str_replace to remove the additional data-updated but I'd like to avoid adding it through a regex in the first place.

i have tried using [^data-updated] with no success and I have found a similar post here: preg_replace expression can't include string but replacing data-fancy by data-updated doesn't work

Is there a way to only add data-updated if it's not already there? There are many other tags in the so I need to be able to check the presence of data-updated anywhere in the img tag

Here is an example of such an image:

<img srcset="xxx" src="yyy" loading="lazy" data-style="aspect-ratio:4/3;" data-placeholder="4-3" data-updated="y" alt="" width="100%">

Thanks! Laurent

1
  • 1
    [^data-updated] negates a character class, here specifically not matching adeptu-. You can't use that syntax for negating strings. If you do want to use regex, you'd want to use a negative lookahead instead. Commented Apr 4, 2022 at 19:38

1 Answer 1

2

Nested structures like HTML are notoriously difficult to parse with regular expressions, because those structures are irregular. Not to mention that what you explicitly need is a parser since you need to test for the presence of attributes before making modifications.

For this, there is DOM. Eg:

$html = <<<_E_
<html>
<head>
    <title>Hello world</title>
</head>
<body>
    <div>
        <h1>Hello World!</h1>
        <img src="/images/foo.jpg">
    </div>
    <div>
        <img someattr="yes" src="/images/bar.jpg">
    </div>
</body>
</html>
_E_;

$d = new DomDocument();
$d->loadHtml($html, LIBXML_HTML_NODEFDTD);

foreach($d->getElementsByTagName('img') as $node) {
    if( $node->attributes->getNamedItem('someattr') === null ) {
        $node->setAttribute('someattr', 'alsoyes');
    }
}

echo $d->saveHTML();

Output:

<html>
<head>
    <title>Hello world</title>
</head>
<body>
    <div>
        <h1>Hello World!</h1>
        <img src="/images/foo.jpg" someattr="alsoyes">
    </div>
    <div>
        <img someattr="yes" src="/images/bar.jpg">
    </div>
</body>
</html>
Sign up to request clarification or add additional context in comments.

1 Comment

you are right, it's the best approach, I was using a DOM parser to create the clean image, I now use it to mark it as clean and wrap it into a div without using regex. Thanks for pointing out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.