1

I'm trying to display information from an xml file. It doesn't gives me error, but the array is empty. I am using wordrpess and I have not much experience with php, so, i don't know if this is the best way to do.

This is my code:

<?php 
function pubmedQuery() { 
    $xml = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]';
    $xml_file =  simplexml_load_file( $xml );
    $results_count = $xml_file->Count;
    $results_ids = array(); 
    foreach ( $xml_file->IdList->Id as $items ) {
        $results_ids[] = $items;
    }
    return "Hay " . $results_count . " resultados: " . $results_ids;
}
//Show results
    echo'<h3>Resultados de búsqueda:</h3>' . pubmedQuery ();    
?>

And this is the result:

Resultados de búsqueda: Hay 0 resultados: Array

thanks! and excuse my english!

1
  • The XML returned actually doesn't contain any results, however when browsing to the xml url, it does. I suspect the server hosting the content is detecting scraping and preventing it? Commented Nov 11, 2013 at 10:15

2 Answers 2

1

@Gavin is right. However, you can get the content by file_get_contents :

function pubmedQuery() { 
    $xml = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]';
    $content =  file_get_contents($xml);
    $xml_file = simplexml_load_string($content);
    $results_count = $xml_file->Count;
    $results_ids = array(); 
    foreach ( $xml_file->IdList->Id as $items ) {
        $results_ids[] = $items;
    }
    return "Hay " . $results_count . " resultados: " . implode("\n",$results_ids);
}
//Show results
echo'<h3>Resultados de búsqueda:</h3>' . pubmedQuery ();   

Outputs

Hay 6 resultados: 19008416 18927361 18787170 18487186 18239126 18239125

Notice implode("\n",$results_ids) which returns a string with the found id's, instead of returning the text array, regardless if there is found id's or not.

Sign up to request clarification or add additional context in comments.

Comments

0

As per my comment, the website you are scraping from appears to have user-agent detection.

function pubmedQuery() { 
    $context = stream_context_create(array(
      'http'=>array(
        'user_agent' => 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11'
       )
    ));

    $xml = file_get_contents('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]', FALSE, $context);

    $xml_file = simplexml_load_string($xml);
    $results_count = $xml_file->Count;
    $results_ids = array(); 
    foreach ( $xml_file->IdList->Id as $items ) {
        $results_ids[] = $items;
    }
    return "Hay " . $results_count . " resultados: " . $results_ids;
}
//Show results
echo'<h3>Resultados de búsqueda:</h3>' . pubmedQuery ();    

The above code will spoof the user-agent for the file_get_contents call so the website will think it's a normal browser.

2 Comments

Thanks! both answer are good. But i don`t unsderstand user-agent detection problem. I will have to read about it.
It's quite possible that the server requires a user-agent to correctly work, however it may also be possible that they purposely prevented you from seeing results unless your what it thinks is a valid browser. In most cases, file_get_contents, simplexml_load_file etc all send a HTTP request without a user-agent, so spoofing one will tell the server you are using, for example, chrome, or firefox. HTH.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.