0

So my brother and I decided to parse xml content from a website using CURL and Dom.

I keep on getting a blank return value when I try to echo various aspect of the dom parts.

Here are some details:

  1. An example website url we are CURLing and using Dom for is like this: https://event.on24.com/eventRegistration/EventServlet?eventid=2062141&sessionid=1&key=FD3181776AA1D3051A0CE6249F1A391A&filter=eventsessionmediapresentationlogplayerxmlformateventrootmediabaseurldialininfomobileenvondemandexcludequestionexcludemessagesexcludeslides
  2. Notice the URL is not the direct path to an XML file. But on that page it has XML content. Try to click on the link, you'll see what I mean.
  3. I am wanting to print the content between the tags.
  4. The way I am using the CURL and Dom scripts are either not right or something else is wrong.

I've tried various echos in different areas of my code but all have returned a blank value. When I try to echo $parsedcontent it comes up with a blank.

When I try to echo "Hello World" after the "Foreach... 'span' as..." it doesn't print anything.

$urlcontent = $event['url']; 
$chcontent = curl_init();
$timeoutcontent = 5;
curl_setopt($chcontent, CURLOPT_URL, $urlcontent);
curl_setopt($chcontent, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chcontent, CURLOPT_CONNECTTIMEOUT, $timeoutcontent);
curl_setopt($chcontent, CURLOPT_SSL_VERIFYPEER, false);
$htmlcontent = curl_exec($chcontent);
$infocontent = curl_getinfo($chcontent);
curl_close($chcontent);

@$domcontent->loadXML($htmlcontent);

foreach($domcontent->getElementsByTagName('span') as $spanon24content) {
    # Get url and title from <a> tags
    $innerHTMLspan = ''; 
    $childrenspan  = $spanon24content->childNodes;

    foreach ($childrenspan as $childspan) { 
        $innerHTMLspan .= $divspanon24content->ownerDocument->saveXML($childspan);
    }
}
$parsedcontent = $innerHTMLspan;

echo $parsedcontent;
5
  • I think the answer on this question might point you in the right direction: stackoverflow.com/questions/6674322/… Commented Aug 7, 2019 at 21:26
  • Possible duplicate of How do you parse and process HTML/XML in PHP? Commented Aug 7, 2019 at 23:55
  • I keep on getting a blank return value when I try to echo various aspect of the dom parts. - when DEBUGGING, use var_dump(), not echo(), to avoid this issue. also make sure that php.ini has error_reporting=E_ALL and display_error=on (or alternatively, make sure the error log works, and read the error log after running your code) Commented Aug 7, 2019 at 23:57
  • Try to click on the link, you'll see what I mean. what do you mean the link ? your test XML page has 80 different links! which of the 80 links do you mean? Commented Aug 8, 2019 at 7:54
  • I am wanting to print the content between the tags. which tags are you talking about, it has 3679 tags, do you want the content between all of them? Commented Aug 8, 2019 at 8:00

1 Answer 1

1

The span is inside an HTML Fragment stored as a text node in the outer XML. For the XML this is just text. You need to load (and parse) it into a separate DOM document.

$xml = <<<'XML'
<events>
  <eventkey>valid</eventkey>
  <nowdate>1565257004221</nowdate>
  <event>
    <eventAbstract><![CDATA[<p><span style="font-size:16px;">Scaling automation in your security environment can involve unnecessary time to clean up task completion notes as more incidents fly in.</span></p>

<p><span style="font-size:16px;">Join Gerald Trotman, CTP for IBM Resilient, in this tech session to learn how Resilient Task Helper Functions can help clean and consolidate notes to improve visibility into completed tasks and ultimately cut down the&nbsp;time to respond for your security team.</span></p>]]>
    </eventAbstract>
  </event>
</events>
XML;

$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMxpath($document);

foreach ($xpath->evaluate('//eventAbstract') as $abstractNode) {
    // load the node content as HTML
    $htmlDocument = new DOMDocument();
    $htmlDocument->loadHTML($abstractNode->textContent);
    $htmlXpath = new DOMXpath($htmlDocument);

    // just read text content
    $innerText = $htmlDocument->textContent;

    // build up a (x)html fragment
    $innerHTML = '';
    foreach ($htmlXpath->evaluate('//span/node()') as $spanChildNode) {
        $innerHTML .= $htmlDocument->saveXML($spanChildNode);
    } 
    var_dump($innerText, $innerHTML);
} 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.