How to parse an XML node with a colon tag using PHP

Question

I am trying to fetch the value of the following nodes from [this URL (takes quite some time to load)][1]. The elements I'm interested in are:

title, g:price and g:gtin

The XML starts like this:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
  <channel>
    <title>PhotoSpecialist.de</title>
    <link>http://www.photospecialist.de</link>
    <description/>
    <item>
      <g:id>BEN107C</g:id>
      <title>Benbo Trekker Mk3 + Kugelkopf + Tasche</title>
      <description>
        Benbo Trekker Mk3 + Kugelkopf + Tasche Das Benbo Trekker Mk3 ist eine leichte Variante des beliebten Benbo 1. Sein geringes Gewicht macht das Trekker Mk3 zum idealen Stativ, wenn Sie viel draußen fotografieren und viel unterwegs sind. Sollten Sie in eine Situation kommen, in der maximale Stabilität zählt, verfügt das Benbo Trekker Mk3 über einen Haken an der Mittelsäule. An diesem können Sie das Stativ mit zusätzlichem Gewicht bei Bedarf beschweren. Dank der zwei besonderen Kamera-Befestigungsschrauben können Sie mit dem Benbo Trekker Mk3 sehr nah am Boden fotografieren. So nah, dass in vielen Fällen die einzige Einschränkung die Größe Ihrer Kamera darstellt. In diesem Set erhalten Sie das Benbo Trekker Mk3 zusammen mit einem Kugelkopf, Socket und einer Tasche für den sicheren und komfortablen Transport.
      </description>
      <link>
        http://www.photospecialist.de/benbo-trekker-mk3-kugelkopf-tasche?dfw_tracker=2469-16
      </link>
      <g:image_link>http://static.fotokonijnenberg.nl/media/catalog/product/b/e/benbo_trekker_mk3_tripod_kit_with_b__s_head__bag_ben107c1.jpg</g:image_link>
      <g:price>199.00 EUR</g:price>
      <g:condition>new</g:condition>
      <g:availability>in stock</g:availability>
      <g:identifier_exists>TRUE</g:identifier_exists>
      <g:brand>Benbo</g:brand>
      <g:gtin>5022361100576</g:gtin>
      <g:item_group_id>0</g:item_group_id>
      <g:product_type>Tripod</g:product_type>
      <g:mpn/>
      <g:google_product_category>Kameras & Optik</g:google_product_category>
    </item>
  ...
  </channel>
</rss>

To get this, I have written the following code:

$z = new XMLReader;
$z->open('https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml');

$doc = new DOMDocument;

while ($z->read() && $z->name !== 'item')
    ;

while ($z->name === 'item')
{
    $node = new SimpleXMLElement($z->readOuterXML());
    $a = $node->title;
    $b = $node->price;
    $c = $node->gtin;
    echo $a . $b . $c . "<br />";
    $z->next('item');
}

This returns me only the title...price and gtin are not showing.

My bad, you're using SimpleXMLElement to access the attributes with their own namespace. So the linked duplicate is not entirely correct (you could just use XMLReader::expand() to obtain the DOMElement directly, convert to DOM via dom_import_simplexml or for sure access the namespaced attributes via SimpleXML directly like in the linked Q&A in this comment). — hakre
– hakre, Commented Apr 26, 2015 at 11:27
@hakre...i can't use simplexml as the XML is large so XMLReader is to be used — user3305327
– user3305327, Commented Apr 26, 2015 at 11:30
Huh? You actually use SimpleXML in your questions code. I was not speaking about switching away from XMLReader when I mentioned it. — hakre
– hakre, Commented Apr 26, 2015 at 11:42
@hakre...oops sorry...actually am very new to this XML coding...btw can you please help me with this problem — user3305327
– user3305327, Commented Apr 26, 2015 at 11:50

hakre · Accepted Answer · 2015-04-26 12:17:26Z

12

The elements you're asking about are not part of the default namespace but in a different one. You can see that because they have a prefix in their name separated by the colon:

  ...
  <channel>
    <title>PhotoSpecialist.de</title>
    <!-- title is in the default namespace, no colon in the name -->
    ...
    <g:price>199.00 EUR</g:price>
    ...
    <g:gtin>5022361100576</g:gtin>
    <!-- price and gtin are in a different namespace, colon in the name and prefixed by "g" -->
  ...

The namespace is given with a prefix, here "g" in your case. And the prefix the namespace stands for is defined in the document element here:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">

So the namespace is "http://base.google.com/ns/1.0".

When you access the child-elements by their name with the SimpleXMLElement as you currently do:

$a = $node->title;
$b = $node->price;
$c = $node->gtin;

you're looking only in the default namespace. So only the first element actually contains text, the other two are created on-thy-fly and are yet empty.

To access the namespaced child-elements you need to tell the SimpleXMLElement explicitly with the children() method. It creates a new SimpleXMLElement with all the children in that namespace instead of the default one:

$google = $node->children("http://base.google.com/ns/1.0");

$a = $node->title;
$b = $google->price;
$c = $google->gtin;

So much for the isolated example (yes, that's it already).

A full example then could look like (including node-expansion on the reader, the code you had was a bit rusty):

<?php
/**
 * How to parse an XML node with a colon tag using PHP
 *
 * @link http://stackoverflow.com/q/29876898/367456
 */
const HTTP_BASE_GOOGLE_COM_NS_1_0 = "http://base.google.com/ns/1.0";

$url = 'https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml';

$reader = new XMLReader;
$reader->open($url);

$doc = new DOMDocument;

// move to first item element
while (($valid = $reader->read()) && $reader->name !== 'item') ;

while ($valid) {
    $default    = simplexml_import_dom($reader->expand($doc));
    $googleBase = $default->children(HTTP_BASE_GOOGLE_COM_NS_1_0);
    printf(
        "%s - %s - %s<br />\n"
        , htmlspecialchars($default->title)
        , htmlspecialchars($googleBase->price)
        , htmlspecialchars($googleBase->gtin)
    );

    // move to next item element
    $valid = $reader->next('item');
};

I hope this both gives an explanation and broadens the view a little on XMLReader use as well.

answered Apr 26, 2015 at 12:17

hakre

200k55 gold badges454 silver badges865 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user3305327 Over a year ago

@hakre..thanks for such nice informative post...its a tutorial for me thanks once again

hakre Over a year ago

An even better variant might be with using DOMXpath. But I have remembered this too late now :) ThW had such an example with XMLReader, I take a look if I find a link. --- Edit: here it is, the example fits really nicely: stackoverflow.com/a/23079179/367456

IMSoP Over a year ago

"only the first element actually contains text, the other two are created on-thy-fly and are yet empty" - that's not really true; all the child elements or attributes are retrieved on-demand (here, ultimately), it's just that the call to ->elements($ns) or ->attributes($ns) tells SimpleXML which ones to retrieve. I find SimpleXML feels less surprising if you think of it as an API, like the DOM but simpler, rather than as objects which "contain" data.

hakre Over a year ago

@IMSoP; I like your description (I've read some of your recent answers in the SimpleXML tag, really very well written, makes me a bit jealous but hopefully my English profits from reading) but some of those elements are also created when accessed, at least when you write data into them: eval.in/319535 - that's what I meant with create on the fly. The original document didn't contain that element (this is for $b and $c in my answer above).

IMSoP Over a year ago

@hakre Ah, I think I see what you mean, but they won't be created just by reading them: eval.in/319537 Since the question is only about reading, the fact that you could create them by assigning a value is kind of by-the-by. Still, an interesting point that referencing them isn't invalid, just not useful for the current task. :)

|

revoke · Accepted Answer · 2021-02-10 07:56:48Z

0

If the main tag is a string with colon, you must use

$xml->next($xml->localName);

to move to the next item element.

answered Feb 10, 2021 at 7:56

revoke

5654 silver badges9 bronze badges

Collectives™ on Stack Overflow

How to parse an XML node with a colon tag using PHP

2 Answers 2

9 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related