0

I have some XML which contains a lot of information in the attributes, here is a small example.

<?xml version="1.0" encoding="UTF-8"?>
 <collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>04170npc a22003613u 4500</leader>
    <controlfield tag="001">vtls003932502</controlfield>
    <controlfield tag="003">WlAbNL</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1002</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
        <subfield code="a">WlAbNL</subfield>
        <subfield code="b">eng</subfield>
        <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
        <subfield code="a">Scott Blair Collection,</subfield>
        <subfield code="f">1910 -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">rheology</subfield>
    </datafield>
  </record>
  <record>
    <leader>04229npc a22005893u 4500</leader>
    <controlfield tag="001">vtls003932503</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1004</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
       <subfield code="a">WlAbNL</subfield>
       <subfield code="b">eng</subfield>
       <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
       <subfield code="a">Celtic Collection,</subfield>
       <subfield code="f">17th century -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">Scottish Gaelic language</subfield>
    </datafield>
 </record>
</collection>

Currently I have a php script which just loads the entire document

$xml = simplexml_load_file("Mapping_coll_wales.xml");
$records = $xml->record;

This creates a records array which looks something like this (i have cut this down a bit to one record)

  SimpleXMLElement Object
(
[leader] => 04170npc a22003613u 4500
[controlfield] => Array
    (
        [0] => vtls003932502
        [1] => WlAbNL
    )
 [datafield] => Array
    (
        [0] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 035
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => (WlAbNL)1002
            )
        [1] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 040
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => Array
                    (
                        [0] => WlAbNL
                        [1] => eng
                        [2] => WlAbNL
                    )

            )

        [2] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 245
                        [ind1] => 0
                        [ind2] => 0
                    )

                [subfield] => Array
                    (
                        [0] => Scott Blair Collection,
                        [1] => 1910 -
                    )
            )
        [3] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 653
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => rheology
            )
    )

)

Currently im just pulling the field I need by presuming where in the array it is, and looping over each record (there are about 500)

for ($i =0; $i <5; $i++) {

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[2]->subfield . '<br />';
echo '<strong>tag</strong>  = :' . $records[$i]->datafield[3]->subfield . '<br />';


echo '<br />------------------------------------------------------------------------<br />';
}

However its possible that the xml may contain other tags, so I dont want to rely on it being the subfield of indices 2 etc. Ideally I would like to be able to call it using something like

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[245][a] . '<br />';

Im sure its fairly straight forward and im just missing something, but it would be good to be able to either load the tags as the array indices or have some way of directly getting datafield by its tag and the subfield by its code, as that wont change.

Hope that makes sense.

Paul

1 Answer 1

1

You can use XPath to match elements that meet certain criteria.

However, because you are using namespaced nodes, you must register the namespace on each node you wish to use xpath() with a namespaced path expression.

See example below, which acts in a loop.

$nsp = 'marc';
$nsuri = 'http://www.loc.gov/MARC21/slim';


$records = $xml->record;


foreach($records as $record) {
    $record->registerXPathNamespace($nsp, $nsuri);
    $datafields = $record->xpath('marc:datafield[@tag=245]');
    foreach ($datafields as $datafield) {
        $datafield->registerXPathNamespace($nsp, $nsuri);
        $subfields = $datafield->xpath('marc:subfield[@code="a"]');
        var_dump($subfields);
    }
}

Alternatively, you can recurse downward using only xpath instead of simplexml object access. Here are two methods which will give the same result:

$records = $xml->record;
$records->registerXPathNamespace($nsp, $nsuri);

$tags = array('245', '653');
$codes = array('a', 'f');

// METHOD 1: run an xpath for each tag/code combination
$desiredfields = array();
foreach ($tags as $tag) {
    $desiredsubfields = array();
    foreach($codes as $code) {
        $subfields = $records->xpath("marc:datafield[@tag='$tag']/marc:subfield[@code='$code']");
        $desiredsubfields[$code] = (string) $subfields[0];
    }
    $desiredfields[$tag] = $desiredsubfields;
}

var_export($desiredfields);

// METHOD 2: create a single xpath expression that matches every subfield you want
// Then visit each subfield retrieving tag from parent
$tagexpr = implode(' or ', array_map(function($t){return "@tag='{$t}'";}, $tags));
$codeexpr = implode(' or ', array_map(function($c){return "@code='{$c}'";}, $codes));
$xpath = "marc:datafield[{$tagexpr}]/marc:subfield[{$codeexpr}]";

$desiredfields = array();
$subfields = $records->xpath($xpath);

foreach ($subfields as $subfield) {
    $datafield = $subfield->xpath('..');
    $datafieldcode = (string) $datafield[0]['tag'];
    $desiredfields[$datafieldcode][(string) $subfield['code']] = (string) $subfield;
}

var_export($desiredfields);
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, but that doesnt seem to be working as im looping over each record, I modified it slightly but it still seems to return an array with -1 in it instead. $records[$i]->xpath('datafield[@tag="245"]/subfield[@code="a"]');
Your xml snippet does not seem to match your code: the xml doesn't have a records element, and your code uses records[$i] where I suspect you mean records->record[$i]. Consider including a more self-contained and faithful sample of your xml.
Thanks, I had tried to cut it down to make it clearer but might have missed something. I have added a clearer xml and more details on how im using php.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.