18

I use DOMDocument for editing some HTML files, but some of theme have in their names spaces. So DOMDocument automaticly change the spaces to %20 and then can't find them.

This is how looks the error exactly:

Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined in file:///C:/Path/To/The/File/01%20c%2040-1964.html, line: 11 in C:/Path/To/class.php on line 51

How to repair this error?

1

5 Answers 5

15

Use DOMDocument::loadHTMLFile() instead of load(). That's what it has been made for. HTML is not XML.

XML does not know the named entity  . However if you use loadHTML, the XML parser will get the HTML named entities loaded so the error goes away.

See as well: XML parser error: entity not defined.

Sign up to request clarification or add additional context in comments.

2 Comments

I have XML with HTML tags not properly inserted into it. I want to load it using load(), because it's XML. What can i do with that?
This doesn't work if the user is working with HTML snippets or incomplete documents.
7

If you load xml - use htmlentities() with flag ENT_XML1.

$offerXml->addChild('name', htmlentities($name, ENT_XML1));

Comments

1

Here is how to parse a XHTML snippet that contains HTML entities like  .

The example is from a piece of code that parses such XHTML exported from Confluence API, including custom elements like <ac:structured-macro>. This is why you need to set up namespaces. Change or remove the xmlns attributes to suit your use case.

$snippet = '<p>hello&nbsp;world</p>';

$entitydefs = XML_HTML_DEFS;

$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
$entitydefs
]>
<root
  xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/"
  xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/"
  xmlns="http://www.atlassian.com/schema/confluence/4/">
  $snippet
</root>
XML;

$dom = new DOMDocument();
$dom->loadXML($xml);

where XML_HTML_DEFS is this (you can remove the ones you know won't occur):

const XML_HTML_DEFS = <<<ENTITIES
<!ENTITY amp "&#38;">
<!ENTITY lt "&#60;">
<!ENTITY gt "&#62;">
<!ENTITY Agrave "&#192;">
<!ENTITY Aacute "&#193;">
<!ENTITY Acirc "&#194;">
<!ENTITY Atilde "&#195;">
<!ENTITY Auml "&#196;">
<!ENTITY Aring "&#197;">
<!ENTITY AElig "&#198;">
<!ENTITY Ccedil "&#199;">
<!ENTITY Egrave "&#200;">
<!ENTITY Eacute "&#201;">
<!ENTITY Ecirc "&#202;">
<!ENTITY Euml "&#203;">
<!ENTITY Igrave "&#204;">
<!ENTITY Iacute "&#205;">
<!ENTITY Icirc "&#206;">
<!ENTITY Iuml "&#207;">
<!ENTITY ETH "&#208;">
<!ENTITY Ntilde "&#209;">
<!ENTITY Ograve "&#210;">
<!ENTITY Oacute "&#211;">
<!ENTITY Ocirc "&#212;">
<!ENTITY Otilde "&#213;">
<!ENTITY Ouml "&#214;">
<!ENTITY Oslash "&#216;">
<!ENTITY Ugrave "&#217;">
<!ENTITY Uacute "&#218;">
<!ENTITY Ucirc "&#219;">
<!ENTITY Uuml "&#220;">
<!ENTITY Yacute "&#221;">
<!ENTITY THORN "&#222;">
<!ENTITY szlig "&#223;">
<!ENTITY agrave "&#224;">
<!ENTITY aacute "&#225;">
<!ENTITY acirc "&#226;">
<!ENTITY atilde "&#227;">
<!ENTITY auml "&#228;">
<!ENTITY aring "&#229;">
<!ENTITY aelig "&#230;">
<!ENTITY ccedil "&#231;">
<!ENTITY egrave "&#232;">
<!ENTITY eacute "&#233;">
<!ENTITY ecirc "&#234;">
<!ENTITY euml "&#235;">
<!ENTITY igrave "&#236;">
<!ENTITY iacute "&#237;">
<!ENTITY icirc "&#238;">
<!ENTITY iuml "&#239;">
<!ENTITY eth "&#240;">
<!ENTITY ntilde "&#241;">
<!ENTITY ograve "&#242;">
<!ENTITY oacute "&#243;">
<!ENTITY ocirc "&#244;">
<!ENTITY otilde "&#245;">
<!ENTITY ouml "&#246;">
<!ENTITY oslash "&#248;">
<!ENTITY ugrave "&#249;">
<!ENTITY uacute "&#250;">
<!ENTITY ucirc "&#251;">
<!ENTITY uuml "&#252;">
<!ENTITY yacute "&#253;">
<!ENTITY thorn "&#254;">
<!ENTITY yuml "&#255;">
<!ENTITY nbsp "&#160;">
<!ENTITY iexcl "&#161;">
<!ENTITY cent "&#162;">
<!ENTITY pound "&#163;">
<!ENTITY curren "&#164;">
<!ENTITY yen "&#165;">
<!ENTITY brvbar "&#166;">
<!ENTITY sect "&#167;">
<!ENTITY uml "&#168;">
<!ENTITY copy "&#169;">
<!ENTITY ordf "&#170;">
<!ENTITY laquo "&#171;">
<!ENTITY not "&#172;">
<!ENTITY shy "&#173;">
<!ENTITY reg "&#174;">
<!ENTITY macr "&#175;">
<!ENTITY deg "&#176;">
<!ENTITY plusmn "&#177;">
<!ENTITY sup2 "&#178;">
<!ENTITY sup3 "&#179;">
<!ENTITY acute "&#180;">
<!ENTITY micro "&#181;">
<!ENTITY para "&#182;">
<!ENTITY cedil "&#184;">
<!ENTITY sup1 "&#185;">
<!ENTITY ordm "&#186;">
<!ENTITY raquo "&#187;">
<!ENTITY frac14 "&#188;">
<!ENTITY frac12 "&#189;">
<!ENTITY frac34 "&#190;">
<!ENTITY iquest "&#191;">
<!ENTITY times "&#215;">
<!ENTITY divide "&#247;">
<!ENTITY forall "&#8704;">
<!ENTITY part "&#8706;">
<!ENTITY exist "&#8707;">
<!ENTITY empty "&#8709;">
<!ENTITY nabla "&#8711;">
<!ENTITY isin "&#8712;">
<!ENTITY notin "&#8713;">
<!ENTITY ni "&#8715;">
<!ENTITY prod "&#8719;">
<!ENTITY sum "&#8721;">
<!ENTITY minus "&#8722;">
<!ENTITY lowast "&#8727;">
<!ENTITY radic "&#8730;">
<!ENTITY prop "&#8733;">
<!ENTITY infin "&#8734;">
<!ENTITY ang "&#8736;">
<!ENTITY and "&#8743;">
<!ENTITY or "&#8744;">
<!ENTITY cap "&#8745;">
<!ENTITY cup "&#8746;">
<!ENTITY int "&#8747;">
<!ENTITY there4 "&#8756;">
<!ENTITY sim "&#8764;">
<!ENTITY cong "&#8773;">
<!ENTITY asymp "&#8776;">
<!ENTITY ne "&#8800;">
<!ENTITY equiv "&#8801;">
<!ENTITY le "&#8804;">
<!ENTITY ge "&#8805;">
<!ENTITY sub "&#8834;">
<!ENTITY sup "&#8835;">
<!ENTITY nsub "&#8836;">
<!ENTITY sube "&#8838;">
<!ENTITY supe "&#8839;">
<!ENTITY oplus "&#8853;">
<!ENTITY otimes "&#8855;">
<!ENTITY perp "&#8869;">
<!ENTITY sdot "&#8901;">
<!ENTITY Alpha "&#913;">
<!ENTITY Beta "&#914;">
<!ENTITY Gamma "&#915;">
<!ENTITY Delta "&#916;">
<!ENTITY Epsilon "&#917;">
<!ENTITY Zeta "&#918;">
<!ENTITY Eta "&#919;">
<!ENTITY Theta "&#920;">
<!ENTITY Iota "&#921;">
<!ENTITY Kappa "&#922;">
<!ENTITY Lambda "&#923;">
<!ENTITY Mu "&#924;">
<!ENTITY Nu "&#925;">
<!ENTITY Xi "&#926;">
<!ENTITY Omicron "&#927;">
<!ENTITY Pi "&#928;">
<!ENTITY Rho "&#929;">
<!ENTITY Sigma "&#931;">
<!ENTITY Tau "&#932;">
<!ENTITY Upsilon "&#933;">
<!ENTITY Phi "&#934;">
<!ENTITY Chi "&#935;">
<!ENTITY Psi "&#936;">
<!ENTITY Omega "&#937;">
<!ENTITY alpha "&#945;">
<!ENTITY beta "&#946;">
<!ENTITY gamma "&#947;">
<!ENTITY delta "&#948;">
<!ENTITY epsilon "&#949;">
<!ENTITY zeta "&#950;">
<!ENTITY eta "&#951;">
<!ENTITY theta "&#952;">
<!ENTITY iota "&#953;">
<!ENTITY kappa "&#954;">
<!ENTITY lambda "&#955;">
<!ENTITY mu "&#956;">
<!ENTITY nu "&#957;">
<!ENTITY xi "&#958;">
<!ENTITY omicron "&#959;">
<!ENTITY pi "&#960;">
<!ENTITY rho "&#961;">
<!ENTITY sigmaf "&#962;">
<!ENTITY sigma "&#963;">
<!ENTITY tau "&#964;">
<!ENTITY upsilon "&#965;">
<!ENTITY phi "&#966;">
<!ENTITY chi "&#967;">
<!ENTITY psi "&#968;">
<!ENTITY omega "&#969;">
<!ENTITY thetasym "&#977;">
<!ENTITY upsih "&#978;">
<!ENTITY piv "&#982;">
<!ENTITY OElig "&#338;">
<!ENTITY oelig "&#339;">
<!ENTITY Scaron "&#352;">
<!ENTITY scaron "&#353;">
<!ENTITY Yuml "&#376;">
<!ENTITY fnof "&#402;">
<!ENTITY circ "&#710;">
<!ENTITY tilde "&#732;">
<!ENTITY ensp "&#8194;">
<!ENTITY emsp "&#8195;">
<!ENTITY thinsp "&#8201;">
<!ENTITY zwnj "&#8204;">
<!ENTITY zwj "&#8205;">
<!ENTITY lrm "&#8206;">
<!ENTITY rlm "&#8207;">
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
<!ENTITY lsquo "&#8216;">
<!ENTITY rsquo "&#8217;">
<!ENTITY sbquo "&#8218;">
<!ENTITY ldquo "&#8220;">
<!ENTITY rdquo "&#8221;">
<!ENTITY bdquo "&#8222;">
<!ENTITY dagger "&#8224;">
<!ENTITY Dagger "&#8225;">
<!ENTITY bull "&#8226;">
<!ENTITY hellip "&#8230;">
<!ENTITY permil "&#8240;">
<!ENTITY prime "&#8242;">
<!ENTITY Prime "&#8243;">
<!ENTITY lsaquo "&#8249;">
<!ENTITY rsaquo "&#8250;">
<!ENTITY oline "&#8254;">
<!ENTITY euro "&#8364;">
<!ENTITY trade "&#8482;">
<!ENTITY larr "&#8592;">
<!ENTITY uarr "&#8593;">
<!ENTITY rarr "&#8594;">
<!ENTITY darr "&#8595;">
<!ENTITY harr "&#8596;">
<!ENTITY crarr "&#8629;">
<!ENTITY lceil "&#8968;">
<!ENTITY rceil "&#8969;">
<!ENTITY lfloor "&#8970;">
<!ENTITY rfloor "&#8971;">
<!ENTITY loz "&#9674;">
<!ENTITY spades "&#9824;">
<!ENTITY clubs "&#9827;">
<!ENTITY hearts "&#9829;">
<!ENTITY diams "&#9830;">
ENTITIES;

2 Comments

excellent, this worked like a charm! solved a bunch of other problems also. In my case I used it to create a XML document like so $xml = new SimpleXMLElement("<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE root [$XML_HTML_DEFS]><root></root>", null, false);
After changing from <html xmlns="http://www.w3.org/1999/xhtml" dir="ltr"> to just <html> my page looks odd... no styles or line-breaks now. But did fix it mostly!
0

Another option, if you have some control over the documents to be passed to DOMDocument is to use HTML entities ( &#160; ) or unicode characters ( U+00A0 ) instead of named characters ( &nbsp; ).

Assuming the following documents: Document 1 ( test_1.xml )

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space ( )</element>
    <element>this line has a non breakin space (&nbsp;)</element>
    <element>this line has an integral symbol (&int;)</element>
</root>

Document 2 ( test_2.xml )

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space (&#32;)</element>
    <element>this line has a non breakin space (&#160;)</element>
    <element>this line has an integral symbol (&#8747;)</element>
</root>

Document 3 ( test_3.xml )

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space ( )</element>
    <element>this line has a non breakin space (U+00A0)</element>
    <element>this line has an integral symbol (U+222B)</element>
</root>

And a php file that loads those documents: ( index.php )

<?php
    declare(strict_types=1);
    $xmlDoc = new DOMDocument();
    $xml_file = file_get_contents( "test_2.xml" );
    $xmlDoc->loadXML( $xml_file );
?>

The first document will generate warnings like the following:

Warning: DOMDocument::loadXML(): Entity 'nbsp' not defined in Entity, line: 4 in /path/to/file/index.php on line 5

Warning: DOMDocument::loadXML(): Entity 'int' not defined in Entity, line: 5 in /path/to/file/index.php on line 5

But documents 2 and 3 will be parsed fine.

Reference:

Comments

-2
$textHTML = '&lt;ul&gt; &lt;li&gt;Dentro da ordem jur&amp;iacute;dica
brasileira.&lt;/li&gt; &lt;/ul&gt;

save in the XML file like this:

htmlspecialchars($textHTML, ENT_QUOTES);

recover the file like this:

$doc->load(file.xml);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.