1

all,

I have a inputXml.xml file as below:

<content>
<item  name="book" label="Book">
<![CDATA[ book name ]]>
</item>
<item  name="price" label="Price">
<![CDATA[ 35 ]]>
</item>
</content>

And when I use code as below to parse the xml file:

$obj = simplexml_load_string(file_get_contents($inputXml),'SimpleXMLElement', LIBXML_NOCDATA);
$json = json_encode($obj);
$inputArray = json_decode($json,TRUE);

I get the array like below:

[content] => Array
            (
                [item] => Array
                    (
                        [0] => book name
                        [1] => 35
                    )

            )

I am wondering, is it possible to get an associative array by using the value of the attributes "name" or "label" as the key as below:

[content] => Array
            (
                [item] => Array
                    (
                        [name] => book name
                        [price] => 35
                    )

            )
3
  • PHP, perhaps? Please edit your question and add a suitable language tag. Commented Jul 31, 2014 at 12:17
  • @Damien_The_Unbeliever, yes, it is PHP, thanks for reminding me about that Commented Jul 31, 2014 at 12:21
  • You have to construct it by yourself through iterating over items and build the assoc arrays yourself. Commented Jul 31, 2014 at 12:25

2 Answers 2

1

First of all you've been fooled by some other code that you would need to json_encode and json_decode to the get the array out of SimpleXMLElement. Instead, you only need to cast to array:

$inputArray = (array) $obj;

Then you've got the problem that the array-serialization you're looking for is not the default serialization that the SimpleXMLElement provides with that XML.

Additionally another minor problem you have is the dependency on using LIBXML_NOCDATA because otherwise you wouldn't get the format to come near. But not depending on that flag (and therefore on the point if the underlying XML would use CDATA or not for element value XML-encoding) would be useful, too, to gain a certain stability of the code.

As SimpleXMLElement does not provide your wanted behavior you have normally two options here: Extend from SimpleXMLElement or decorate it. I normally suggest decoration as extension is limited. E.g. you can not interfere via extension with the (array) casting, you can however for JSON serialization. But that's not what you're looking for, you're looking for array serialization.

So for a kind-of-standard array serialization of a SimpleXMLElement you could implement this with a serializer and a strategy object on how to array-serialize a specific element.

This first needs the serializer:

interface ArraySerializer
{
    public function arraySerialize();
}

class SimpleXMLArraySerializer implements ArraySerializer
{
    /**
     * @var SimpleXMLElement
     */
    private $subject;

    /**
     * @var SimpleXMLArraySerializeStrategy
     */
    private $strategy;

    public function __construct(SimpleXMLElement $element, SimpleXMLArraySerializeStrategy $strategy = NULL) {
        $this->subject  = $element;
        $this->strategy = $strategy ?: new DefaultSimpleXMLArraySerializeStrategy();
    }

    public function arraySerialize() {
        $strategy = $this->getStrategy();
        return $strategy->serialize($this->subject);
    }

    /**
     * @return SimpleXMLArraySerializeStrategy
     */
    public function getStrategy() {
        return $this->strategy;
    }
}

This array-serializer is yet missing the functionality to serialize. This has been directed to a strategy so that it can be easily exchanged later on. Here is a default strategy to do so:

abstract class SimpleXMLArraySerializeStrategy
{
    abstract public function serialize(SimpleXMLElement $element);
}

class DefaultSimpleXMLArraySerializeStrategy extends SimpleXMLArraySerializeStrategy
{
    public function serialize(SimpleXMLElement $element) {
        $array = array();

        // create array of child elements if any. group on duplicate names as an array.
        foreach ($element as $name => $child) {
            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $this->serialize($child);
            } else {
                $array[$name] = $this->serialize($child);
            }
        }

        // handle SimpleXMLElement text values.
        if (!$array) {
            $array = (string)$element;
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }
}

This object contains a common way to convert a SimpleXMLElement into an array. It behaves comparable to what your XML as SimpleXMLElement with LIBXML_NOCDATA already does. However it does not have the problem with CDATA. To show this, the following example already gives the output you have:

$obj        = new SimpleXMLElement($xml);
$serializer = new SimpleXMLArraySerializer($obj);
print_r($serializer->arraySerialize());

Now as so far the array serialization has been implemented in types of it's own, it's easy to change it according to the needs. For the content element you have a different strategy to turn it into an array. It is also far easier:

class ContentXMLArraySerializeStrategy extends SimpleXMLArraySerializeStrategy
{
    public function serialize(SimpleXMLElement $element) {
        $array = array();

        foreach ($element->item as $item) {
            $array[(string) $item['name']] = (string) $item;
        }

        return array('item' => $array);
    }
}

What's left is to wire this into the SimpleXMLArraySerializer on the right condition. E.g. depending on the name of the element:

...

    /**
     * @return SimpleXMLArraySerializeStrategy
     */
    public function getStrategy() {
        if ($this->subject->getName() === 'content') {
            return new ContentXMLArraySerializeStrategy();
        }

        return $this->strategy;
    }
}

Now the same example from above:

$obj        = new SimpleXMLElement($xml);
$serializer = new SimpleXMLArraySerializer($obj);
print_r($serializer->arraySerialize());

would give you the wanted output (beautified):

Array
(
    [item] => Array
        (
            [book]  => book name
            [price] => 35 
        )
)

As your XML probably only have this one element, I'd say such a level of abstraction might be a little much. However, if the XML is going to change and you have actually multiple array format needs within the same document, this is a plausible way to go.

The default serialization I've used in my example is based on the decoration example in SimpleXML and JSON Encode in PHP – Part III and End.

Sign up to request clarification or add additional context in comments.

2 Comments

@hakrethanks for your answer. My xml actually have more elements, but it is true your solution is too much, I already figured out a work around solution. but yours is still helpful, because that is what I want at the first place, but thought it can be easier.
You're basically transforming the structure of the data, so that always needs a mapping. It's most likely easier to do this based on xpath then. Would perhaps be worth an example.
0

I took a quick look at the SimpleXMLElement docs, which showed that it is actually quite easy to construct an array like you want:

$xml = simplexml_load_file($file, 'SimpleXMLElement', LIBXML_NOCDATA);
$result = array();//store assoc array here
foreach ($xml->item as $item)
{//iterate over item nodes
    if (isset($item['name']))
    {//attributes are accessible as array keys
        $result[(string) $item['name']] = (string) $item;//casts required!
    }
}
var_dump($result);

This is because the SimpleXMLElement is a traversable object, so you can access its properties as though it were an array. However, we do need to cast the properties, because they're all instances of the SimpleXMLElement class.
The code above is a simplified version of what I had written initially:

$xml = simplexml_load_file($fileName, 'SimpleXMLElement', LIBXML_NOCDATA);
foreach ($xml as $name => $node)
{
    if ($name === 'item')
    {
        $key = false;
        foreach ($node->attributes() as $name => $attr)
        {
            if ($name == 'name')
            {
                $key = (string) $attr;//attr is object, still
                break;
            }
        }
        if ($key !== false)
            $result[$key] = (string) $node;
    }
}

This works, too. However the code looks, I think you'll agree, quite messy. I'd stick to the first version I posted here...


Initial answer (using DOMDocument)

I'll look into how to do this using simpleXML, but for now, here's how I'd set about the business of getting CDATA values using the DOMDocument API:

$dom = new DOMDocument;
$dom->load($file);
//get items
$items = $dom->getElementsByTagName('item');
$cData = array();
foreach ($items as $node)
{
    if ($node->hasChildNodes())
    {
        foreach ($node->childNodes as $cNode)
        {
            if ($cNode->nodeType === XML_CDATA_SECTION_NODE)
               $cData[] = $cNode->textContent;//get contents
        }
    }
}

Use this in combination with other methods like $node->attributes->getNamedItem('name'); to get a node's attribute, $node->attributes->getNamedItem('name')->nodeValue; to get that attribute's value.
I admit, the DOMDocument api looks quite verbose (because it is), and it feels a bit clunky (as it has always done), but it's really not too difficult to figure out, once you've read the manual

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.