3

I am trying to parse the abstract part from the XML file. I am using forcearray. I wrote the code but its just working when the abstract is in array and not working when array is not present. This is because when in an array I also use {content} and when not in array the {content} is missing. The code is as follows

use LWP::Simple;
use XML::Simple;
use Data::Dumper;

open (FH, ">:utf8","xmlparsed2.txt");

my $db1 = "pubmed";
my $query  = "9915366";
my $q = 16404398;
my $xml = new XML::Simple;

$urlxml = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=$db1&id=$q&retmode=xml&rettype=abstract";
$dataxml = get($urlxml);
$data = $xml->XMLin("$dataxml", ForceArray => [qw( MeshHeading Author AbstractText )], ForceContent => 1);
print FH Dumper($data);

print FH "Abstract: ".join "\n", map {join ":",($_->{NlmCategory},$_->{content})} @{$data->{PubmedArticle}->{MedlineCitation}->{Article}->{Abstract}->{AbstractText}};
print FH "\n";
print FH "Title: "."$data->{PubmedArticle}->{MedlineCitation}->{Article}->{ArticleTitle}\n";
print FH "\n";
print FH "MeSH: ".join '$$', map $_->{DescriptorName}{content}, @{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}};
print FH "\n";
print FH "Authors: ".join '$$', map {join " ",($_->{LastName},$_->{ForeName})} @{$data->{PubmedArticle}{MedlineCitation}{Article}{AuthorList}{Author}};

Well, when in array(replcae $q in $urlxml by $query) I want the abstract with its NlmCategory like Objective: To determine if the long....... For the above code it is giving me the desired output but with hash at the end as follows:

METHODS:Tertiary care outpatient and inpatient rehabilitation center directly attached to a university hospital.:HASH(0x69d0810).

And for the abstract where it is not an array($q in $urlxml) this code doesn't seem to work, probably because there is not content term(I found this in data dumper). I played a bit and it sort of worked if I do something like just $_ for the array but also prints the two ::. In short I want my code to work for both $query and $q. Can you help?

1 Answer 1

4

Use ForceContent => 1.

Or:

use strict;
use warnings;
use feature qw( say );

use LWP::Simple qw( get );
use XML::LibXML qw( );
use URI         qw( );

binmode STDOUT, ':encoding(UTF-8)';

my $db = "pubmed";
my $id = $ARGV[0] || '9915366';

my $url = URI->new('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi');
$url->query_form(
   db      => $db,
   id      => $id,
   retmode => 'xml',
   rettype => 'abstract',
);

my $xml = get($url);

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $root = $doc->documentElement();

for my $node ($root->findnodes('PubmedArticle/MedlineCitation/Article/Abstract/AbstractText')) {
   say join ':', $node->getAttribute('NlmCategory') // '', $node->textContent();
}
Sign up to request clarification or add additional context in comments.

5 Comments

Ooppss..nice to see you here. Well, I am trying to put questions in right sense now. Thank you for your help. Okay, thank you for the code but I already have a code which has other things included in it to extract along with abstract. To be frank I am a beginner,it really takes me long to understand the basic things about the code. And I just don't want to do it because I am doing it, I want to learn things. So, I understood this code a bit but then I will start looking for few things in it and then modify my original code which will take long time. So, I will go with your first answer.
I tired using forceconten=>1 as edited in the code above but then I am also trying to extract other things which is also edited in the code above. When I use forcecontent my other things are messed up which are not coming through content. Also, I am going to loop this code for the set of all the ids I have. I hope I am clear.
@smandape, So add ->{content} where needed, as shown by the dump you create.
@smandape, By the way, I showed an alternative to XML::Simple because XML::Simple is the most difficult XML parser to use correctly.
I thank you for your help. This really worked. Well, I will also try the alternative way for XML::Simple. Thank you for your help. Thank you a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.