1

I'm using DOMDocument and XPath.

Given to following XML

<Description>
    <CompleteText>
        <DetailTxt>
            <Text>
                <span>Here there is some text</span>
                <h2>And maybe a headline</h2>
                <br/>
                <span>Normal position</span>
                <br/>
                <span> </span>
                <br/>
            </Text>
        </DetailTxt>            
    </CompleteText>
</Description>

The node /Description/CompleteText/DetailTxt/Text contains markup, unfortunately unescaped, but I can't change that. Is there any chance I can query that content maintaining the html markup?

What I tried

Obviously, nodeValue but also textContent. Both giving me the content omitting markup.

4
  • 1
    What did you try? Commented Oct 4, 2019 at 16:00
  • Updated the OP. Commented Oct 5, 2019 at 0:03
  • What do you mean by "unfortunately unescaped"? You'd have to illustrate that in your example – at the moment everything is perfectly valid XML, including the HTML. Commented Oct 5, 2019 at 5:27
  • I understand that the source is perfectly valid XML ;) but the context switch (xml to html) should have been handled by the creator of the document by encoding the html. Which hasn't been done, unfortunately. Commented Oct 8, 2019 at 7:34

2 Answers 2

1

You can use the saveHTML method of DOMDocument to serialize a node as HTML, in your case you seem to want to call it on each child node of the selected node and concatenate the strings; in the browser DOM APIs that would be called innerHTML so I have written a function of that name doing that and also used the ability to call PHP functions from XPath in the following snippet:

<?php
$xml = <<<'EOD'
<Description>
    <CompleteText>
        <DetailTxt>
            <Text>
                <span>Here there is some text</span>
                <h2>And maybe a headline</h2>
                <br/>
                <span>Normal position</span>
                <br/>
                <span> </span>
                <br/>
            </Text>
        </DetailTxt>            
    </CompleteText>
</Description>  
EOD;

$doc = new DOMDocument();

$doc->loadXML($xml);

$xpath = new DOMXPath($doc);

function innerHTML($nodeList) {
  $node = $nodeList[0];
  $html = '';
  $containingDoc = $node->ownerDocument;
  foreach ($node->childNodes as $child) {
      $html .= $containingDoc->saveHTML($child);
    }
  return $html;
}

$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("innerHTML");



$innerHTML = $xpath->evaluate('php:function("innerHTML", /Description/CompleteText/DetailTxt/Text)');

echo $innerHTML;

Output as http://sandbox.onlinephpfunctions.com/code/62a980e2d2a2485c2648e16fc647a6bd6ff5620b is

            <span>Here there is some text</span>
            <h2>And maybe a headline</h2>
            <br>
            <span>Normal position</span>
            <br>
            <span> </span>
            <br>
Sign up to request clarification or add additional context in comments.

1 Comment

I found another (almost) working solution. Coud you please see my answer and comment? I will pick one based on your opinion. Thanks for your help! It's appreciated.
0

I find a good result with using the C14n method of DOMNode.

http://sandbox.onlinephpfunctions.com/code/90dc915c9a43c91d31fcd47d37e89df430951b2e

<?php
$xml = <<<'EOD'
<Description>
    <CompleteText>
        <DetailTxt>
            <Text>
                <span>Here there is some text</span>
                <h2>And maybe a headline</h2>
                <br/>
                <span>Normal position</span>
                <br/>
                <span> </span>
                <br/>
            </Text>
        </DetailTxt>            
    </CompleteText>
</Description>  
EOD;

$doc = new DOMDocument();

$doc->loadXML($xml);

$xpath = new DOMXPath($doc);

function innerHTML($nodeList) {
  $node = $nodeList[0];
  $html = '';
  $containingDoc = $node->ownerDocument;
  foreach ($node->childNodes as $child) {
      $html .= $containingDoc->saveHTML($child);
    }
  return $html;
}

$xpath->registerNamespace("php", "http://php.net/xpath");


$domNodes = $xpath->query('/Description/CompleteText/DetailTxt/Text');
$domNode = $domNodes[0];
$innerHTML = $domNode->C14N();

echo $innerHTML;

Result

<Text>
                <span>Here there is some text</span>
                <h2>And maybe a headline</h2>
                <br></br>
                <span>Normal position</span>
                <br></br>
                <span> </span>
                <br></br>
            </Text>

Seems shorter in a way, what do you think? I would need to get rid of node though. Thanks also for pointing me to PHP Sandbox.

Update

I realize, C14N() changes the markup. See <br /> to <br></br>.

3 Comments

None of the approaches will or can ensure you get the original markup as that is not stored, it is a serialization of the DOM node structure back to a string. You said you consider the content HTML markup so that is why is suggested to use saveHTML, you said you had this markup inside of a non HTML you don't want to output/serialize as well so that is why I only processed the child nodes and did not directly call saveHTML on $nodeList[0].
And the rest of my solution was just showing that you can call such functions directly from XPath which it seems you don't need if your second approach suffices, of course using $domNode->ownerDocument->saveHTML($domNode) directly from PHP would also have been possible, but, as your approach, gives rather the outerHTML then the innerHTML. So the PHP function I wrote helps to only get the HTML serialization of the contents of the Text element, whether you need to be able to call it directly from XPath I don't know.
To your first comment: yes, your output was more desired. I went with your solution now because I don't need the outer. To your second comment: I don't need to call it within XPath directly, so I slightly adapted your solution but use your function, as I need the inner only.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.