0

Suppose you have the following HTML:

<style><input><div name="myDiv"></div></style>

You want to load it into a PHP DOMDocument object, how should you do it? If you use $doc->loadHTML() it will have the problem that the <div> is inside the <style> tag. If you use $doc->loadXML() it will have the problem that the <input> tag doesn't close.

Note: I can't edit the HTML, only the PHP used to parse it, because I'm scraping here.

2 Answers 2

5

Try this:

$doc = new DOMDocument;
$doc->recover = true;
$doc->loadXml($response);

The $doc->recover = true tells DOMDocument to try and parse non-well formed documents. See the documentation for more information.

Sign up to request clarification or add additional context in comments.

Comments

0

Can't you turn the html into a string, explode it and then stitch it back with the closing tag?

1 Comment

This is just a small segment of the HTML i'm dealing with. there are tons of inputs and tons of invalid tags (like divs inside styles). This is just a small sample to show the problem

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.