2

I have a HTML file formatted like this:

<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>

<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>

I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.

this is my code:

$doc = new DOMDocument();

$doc->loadHTMLFile("file.html");

$xpath = new DomXpath($doc);

$subject = $xpath->query('//p');


for ($i = 0 ; $i < $subject->length-1 ; $i ++) {

if ($subject->item($i)->getAttribute("class") == "p1")
    echo $subject->item($i)->nodeValue;
}
...

This is not my full code, but the problem is:

echo $subject->item($i)->nodeValue;

Which gives me <p>detail important</p>, without the <span></span> tag.

It is so important to have the span tags around the "important" part of the detail. is there any function which can do that without getting headache?

Thanks in advance

2

3 Answers 3

1

I found the answer to my question :) Thanks to SimpleHTMLDOM

foreach($html->find('p') as $element) {

 switch ($element->class) {
      case 'p1':
                     $subject = $element;
                     break;
      case 'p2': $detail .= html_entity_decode($element);

 }

}

the trick is in:

html_entity_decode($element);
Sign up to request clarification or add additional context in comments.

Comments

0

Whenever I need to parse HTML, I run it through SimpleHTMLDOM:

http://simplehtmldom.sourceforge.net/

I recommend using version 1.11. For various reasons, 1.5 is rather broken.

2 Comments

SimpleHTMLDOM is defective by design, I would not recommend it therefore but instead something that's based on DomDocument, see as well: stackoverflow.com/questions/3606792/…
The same problem here. it has only $element->plaintext. so if there is any tag in the element, it will be extracted as plain text :(
0

Old query, but there is an one-liner. The OP should use:

$subject = $xpath->query('//p/*');

and then:

echo $doc->saveHtml($subject->item($i));

With the * you'll get the inner html (without the wrapping paragraph tag); without * you'll get the html with the wrapping paragraph;

Full example:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: <p>ciao questa è una <b>prova</b>.</p>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.