0

I wish to extract some ids(doi, pmcid and pmid) from a .xml file from the record tag using python:

xml file:

<pmcids status="ok">
    <request idtype="doi" dois="" versions="yes" showaiid="no">
        <warning>no e-mail provided</warning>
        <warning>no tool provided</warning>
        <echo>ids=10.1371%2Fjournal.pone.0054577</echo>
    </request>
    <record requested-id="10.1371/JOURNAL.PONE.0054577"     pmcid="PMC3557238" pmid="23382917" doi="10.1371/journal.pone.0054577">
        <versions><version pmcid="PMC3557238.1" current="true"/>
        </versions>
    </record>
</pmcids>

I have tried the following code of python :

import xml.etree.cElementTree as etree

xmlDoc = open('garbage_collector/tmp.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)

for ingredient in xmlDocTree.iter('record'):
    print ingredient[0].text

I want pmcid, doi and pmid as output in the form of string

0

1 Answer 1

0

If you can use BeautifulSoup, you could do

from bs4 import BeautifulSoup
soup = BeautifulSoup(input_xml)
t = soup.find('record')

where input_xml is the xml to be examined in string form.

We find the record tag with the find() function and store it in a variable t. The attributes of the <record> tag can now be accessed by indexing t.

print(t['pmcid'])
print(t['doi'])
print(t['pmid'])

would print

PMC3557238
10.1371/journal.pone.0054577
23382917
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.