2

I load html using DOMDocument and then apply DOMXPath filters on it. Does any of these objects have a way to distinguish attribute with empty string value and attribute without a value?

<input required="">
<input required>

I know in both cases the required attribute is technically true, but when I call DOMDocument::saveHTML(), I can see the them as shown. Now I want to filter only those with empty string value. The closest solution I found was

$xpath = new DOMXPath($dom);
$xpath->query("//*[@*[string-length()=0]]");

But unfortunately this also matches the attributes without value.

4
  • I don't think DOMDocument likes attributes without values (HTML 5) and if you import something like you have and then use saveXML() instead if the html version you will see that both now have a blank attribute. Commented Feb 8, 2021 at 19:42
  • Your example XML is invalid? Commented Feb 8, 2021 at 19:44
  • @Luuk yes, I load the html string using loadHTML() method Commented Feb 8, 2021 at 19:45
  • @JanTuroň: Sorry, .... (I hate webscraping.... 😖😉) Commented Feb 8, 2021 at 19:49

1 Answer 1

2

As I mentioned in the comment, DOMDocument will attempt to make a valid XML document from the HTML. And as far as I know it doesn't properly support HTML5 style attributes without values. So I'm not sure if XPath will work in this case.

But after some experimentation with the actual DOM underlying the document, it seems that although the element has an attribute node, there isn't a text node for the value.

So the following checks for this situation...

$a = '<input required="">
<input required>'
;

$dom = new DOMDocument();
$dom->loadHTML($a, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach ( $xpath->query("//*[@*[string-length()=0]]") as $tag)  {
    if ( isset($tag->attributes[0]->firstChild) ){
        echo "with attribute value:" . $dom->saveHTML($tag) . PHP_EOL;
    }
    else    {
        echo "without attribute value:" . $dom->saveHTML($tag) . PHP_EOL;
    }
}

which shows ....

with attribute value:<input required="">
without attribute value:<input required>

Just noticed that the code uses attributes[0] as it's purely for testing purposes. You will need to alter this according to your needs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.