php Xpath getting innerHTML with innerHTML tags

Question

I have a HTML file formatted like this:

<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>

<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>

I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.

this is my code:

$doc = new DOMDocument();

$doc->loadHTMLFile("file.html");

$xpath = new DomXpath($doc);

$subject = $xpath->query('//p');


for ($i = 0 ; $i < $subject->length-1 ; $i ++) {

if ($subject->item($i)->getAttribute("class") == "p1")
    echo $subject->item($i)->nodeValue;
}
...

This is not my full code, but the problem is:

echo $subject->item($i)->nodeValue;

Which gives me detail important, without the  tag.

It is so important to have the span tags around the "important" part of the detail. is there any function which can do that without getting headache?

Thanks in advance

I found this SO entry that I hope will help : stackoverflow.com/questions/6286362/… — SGB
– SGB, Commented Oct 22, 2011 at 17:37

user1008735 · Accepted Answer · 2011-10-29 20:27:27Z

1

I found the answer to my question :) Thanks to SimpleHTMLDOM

foreach($html->find('p') as $element) {

 switch ($element->class) {
      case 'p1':
                     $subject = $element;
                     break;
      case 'p2': $detail .= html_entity_decode($element);

 }

}

the trick is in:

html_entity_decode($element);

answered Oct 29, 2011 at 20:27

user1008735

313 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Supreme Pizza · Accepted Answer · 2011-10-22 18:28:12Z

0

Whenever I need to parse HTML, I run it through SimpleHTMLDOM:

http://simplehtmldom.sourceforge.net/

I recommend using version 1.11. For various reasons, 1.5 is rather broken.

answered Oct 22, 2011 at 18:28

Supreme Pizza

91 bronze badge

2 Comments

hakre Over a year ago

SimpleHTMLDOM is defective by design, I would not recommend it therefore but instead something that's based on DomDocument, see as well: stackoverflow.com/questions/3606792/…

user1008735 Over a year ago

The same problem here. it has only $element->plaintext. so if there is any tag in the element, it will be extracted as plain text :(

Marco Marsala · Accepted Answer · 2020-03-13 15:45:40Z

0

Old query, but there is an one-liner. The OP should use:

$subject = $xpath->query('//p/*');

and then:

echo $doc->saveHtml($subject->item($i));

With the * you'll get the inner html (without the wrapping paragraph tag); without * you'll get the html with the wrapping paragraph;

Full example:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: ciao questa è una prova.

answered Mar 13, 2020 at 15:45

Marco Marsala

2,5015 gold badges27 silver badges42 bronze badges

Collectives™ on Stack Overflow

php Xpath getting innerHTML with innerHTML tags

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related