2

I have been trying to modify HTML tag elements, I have a huge list of HTML files which needs to be modified.

I need the lines between the images and the following text removed. I am using PHP. I am using a DOMDocument to access all the nodes and I am also able to get the nodepath. But I am unable to get to modify that particular HTML tag from the nodepath. Is this possible?

This is what I have as of now:

$dom = new DOMDocument();
$dom->loadHTMLFile($pathname);
$i=0;
$allNodes = $dom->getElementsByTagName('*');
$tagNamesArray = array();
foreach($allNodes as $node) {
$tagNodePath = $node->getNodePath();
$tagName = end(explode('/',$node->getNodePath()));
$tagNamesArray[$i][1] = $tagName;
$tagNamesArray[$i][2] = $tagNodePath;
$i++;
}

checkForLines($tagNamesArray, $dom);

function checkForLines($tagsArray, $dom) {
$xPath = new DOMXpath($dom);
for($i=0 ; $i<(count($tagsArray)-1) ; $i++) {
    if($tagsArray[$i][1] == 'img' && $tagsArray[$i+1][1] == 'br') {
        echo $tagsArray[$i+1][2].'<br>';
        $lineTag = $xPath->query($tagsArray[$i+1][2]);
        $domElement = $dom->removeChild($lineTag);
    }
}
}
5
  • 1
    Can you show us the HTML you're attempting to modify? Perhaps the intended resulting HTML as well. Commented May 18, 2012 at 15:08
  • You could also make it easier by not using the cumbersome raw DOMDocument for modification. phpQuery or QueryPath allow for qp($html)->find("div a")->wrap("<p class=new>"); for example. Commented May 18, 2012 at 15:16
  • @JonathanSampson -- consider any image and after it some following text(be it <p> or <h1> etc...) between the image and text there are <br> tags which I want removed. I already have a file iterator which I am able to get the html files but I am unable to modify the source code. Commented May 18, 2012 at 15:17
  • @Guru: are you ever saving the file back to disk with $dom->saveHTMLFile($filename)? Commented May 18, 2012 at 15:23
  • @Guru Going on your comment here, I've provided an answer below. Commented May 18, 2012 at 15:44

2 Answers 2

3

...consider any image and after it some following text(be it <p> or <h1> etc...) between the image and text there are <br> tags which I want removed...

If this is all you want to do:

$dom = new DOMDocument;
$dom->loadHTML( "<img src='foo.png' /><br/><p>Hello World</p>" );

$img = $dom->getElementsByTagName("img");

foreach ( $img as $current ) {
    $sibling = $current->nextSibling;
    if ( $sibling->nodeName === "br" )
        $current->parentNode->removeChild( $sibling );
}

echo $dom->saveHTML();

Which results in the following output:

<img src="foo.png"><p>Hello World</p>
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, thank you. My aim is to change a bunch of things but this was the fundamental step which I needed to understand. This works but for some reason, the image is not being displayed and this is not overwriting the original file. Any idea why ?
@Guru If you want to write over the old file you need to use the $dom->saveHTMLFile() method. With regards to the image not being displayed, make sure you use your HTML, and not the sample HTML I used in the question. Instead of $dom->loadHTML(), use $dom->loadHTMLFile( $filename ).
0

Create DOM from string

$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

$html->find('div', 1)->class = 'bar';

$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.