0

-

Hello Everyone,

I'm trying to access data in a XML file:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi-    namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
 <responseDate>2013-04-15T12:14:31Z</responseDate>
 <ListRecords>
 <record>
 <header>
 <identifier>
 a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109
 </identifier>
 <datestamp>2012-08-16T14:42:52Z</datestamp>
 </header>
 <metadata>
 <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
 <dc:description>...</dc:description>
 <dc:date>1921</dc:date>
 <dc:identifier>K11510</dc:identifier>
 <dc:source>Waterschap Vallei & Eem</dc:source>
 <dc:source>...</dc:source>
 <dc:source>610</dc:source>
 <dc:coverage>Bunschoten</dc:coverage>
 <dc:coverage>Veendijk</dc:coverage>
 <dc:coverage>Spakenburg</dc:coverage>
 </oai_dc:dc>
 </metadata>
 <about>...</about>
 </record>

This a a example of the XML.

I need to access data like dc:date dc:source etc.

Anyone any ideas?

Best regards, Tim

-- UPDATE --

I'm now trying this:

foreach( $xml->ListRecords as $records )
{
foreach( $records AS $record )
{

    $data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );

    $rows = $data->children( 'http://purl.org/dc/elements/1.1/' );

    echo $rows->date;


    break;
}

break;
}
5

4 Answers 4

3

You have nested elements that are in different XML namespaces. In concrete you have got two additional namespaces involved:

$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc    = 'http://purl.org/dc/elements/1.1/';

The first one is for the <oai_dc:dc> element which contains the second ones * <dc:*>* elements like <dc:description> and so on. Those are the elements you're looking for.

In your code you already have a good nose how this works:

$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );

$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );

However there is a little mistake: the $data children are not children of $record but of $record->metadata.

You also do not need to nest two foreach into each other. The code example:

$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';

$nsUriDc    = 'http://purl.org/dc/elements/1.1/';

$records = $xml->ListRecords->record;

foreach ($records as $record)
{    
    $data = $record->metadata->children($nsUriOaiDc);

    $rows = $data->children($nsUriDc);

    echo $rows->date;

    break;
}

/** output: 1921 **/

If you are running into problems like these, you can make use of $record->asXML('php://output'); to show which element(s) you are currently traversing to.

Sign up to request clarification or add additional context in comments.

2 Comments

I had the same problem, thanks so much for posting this solution. Saved me a great deal of time! :)
My problem is that I have extract the namespace URLs from attributes of the header. But this is not working. I tried an example from here but that didn't help: php.net/manual/en/simplexmlelement.attributes.php
0

I think this is what you're looking for. Hope it helps ;)

3 Comments

Hey Julio, I tried that, but I think because it's a namespace in a namespace it doenst work like that.
@TimHanssen: No, that should not introduce you any problems. You just need to do it again - with multiple namespaces.
So i tried using foreach( $xml->ListRecords as $records ) { foreach( $records AS $record ) { $data = $record->children( 'openarchives.org/OAI/2.0/oai_dc' ); $rows = $data->children( 'purl.org/dc/elements/1.1' ); echo $rows->date; break; } break; } I got the error: Warning: main(): Node no longer exists
0

use DomDocument for this like access to dc:date

  $STR='
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi-    namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
 <responseDate>2013-04-15T12:14:31Z</responseDate>
 <ListRecords>
 <record>
 <header> <identifier> a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109 </identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
 <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
  <dc:description>...</dc:description>
  <dc:date>1921</dc:date>
  <dc:identifier>K11510</dc:identifier>
  <dc:source>Waterschap Vallei & Eem</dc:source>
  <dc:source>...</dc:source>
  <dc:source>610</dc:source>
  <dc:coverage>Bunschoten</dc:coverage>
  <dc:coverage>Veendijk</dc:coverage>
  <dc:coverage>Spakenburg</dc:coverage>
 </oai_dc:dc>
</metadata>
<about>...</about>
</record>';

  $dom= new DOMDocument; 
  $STR= str_replace("&", "&amp;", $STR);  // disguise &s going IN to loadXML() 
  // $dom->substituteEntities = true;  // collapse &s going OUT to transformToXML() 
  $dom->recover = TRUE;
  @$dom->loadHTML('<?xml encoding="UTF-8">' .$STR); 
  // dirty fix
  foreach ($dom->childNodes as $item)
  if ($item->nodeType == XML_PI_NODE)
      $dom->removeChild($item); // remove hack
  $dom->encoding = 'UTF-8'; // insert proper

  print_r($doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('date')->item(0)->textContent);

output:

 1921

or access to dc:source

 $source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('source');
 foreach($source as $value){
     echo $value->textContent."\n";
 }

output:

Waterschap Vallei & Eem
...
610

or give you array

 $array=array();
 $source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName("*");
 foreach($source as $value){

     $array[$value->localName][]=$value->textContent."\n";


 } 
 print_r($array);

output:

 Array
(
   [description] => Array
    (
        [0] => ...

    )

   [date] => Array
    (
        [0] => 1921

    )

   [identifier] => Array
    (
        [0] => K11510

    )

   [source] => Array
    (
        [0] => Waterschap Vallei & Eem

        [1] => ...

        [2] => 610

    )

   [coverage] => Array
    (
        [0] => Bunschoten

        [1] => Veendijk

        [2] => Spakenburg

    )

)

Comments

0

Using XPath makes dealing with namespaces more straightforward:

<?php

// load the XML into a DOM document
$doc = new DOMDocument;
$doc->load('oai-response.xml'); // or use $doc->loadXML($xml) for an XML string

// bind the DOM document to an XPath object
$xpath = new DOMXPath($doc);

// map all the XML namespaces to prefixes, for use in XPath queries
$xpath->registerNamespace('oai', 'http://www.openarchives.org/OAI/2.0/');
$xpath->registerNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');

// identify each record using an XPath query
// collect data as either strings or arrays of strings
foreach ($xpath->query('oai:ListRecords/oai:record/oai:metadata/oai_dc:dc') as $item) {
    $data = array(
        'date' => $xpath->evaluate('string(dc:date)', $item), // $item is the context for this query
        'source' => array(),
    );

    foreach ($xpath->query('dc:source', $item) as $source) {
        $data['source'][] = $source->textContent;
    }

    print_r($data);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.