0

I am not familiar with xml files at all but trying to parse this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<modeling>
 <generator>
  <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel </i>
  <i name="platform" type="string">LinuxIFC </i>
  <i name="date" type="string">2019 07 11 </i>
  <i name="time" type="string">11:56:12 </i>
 </generator>
 <incar>
  <i type="int" name="ISTART">     0</i>
  <i type="string" name="PREC">accurate</i>
  <i type="int" name="ISPIN">     2</i>
  <i type="int" name="NELMDL">    -8</i>
  <i type="int" name="IBRION">     2</i>
  <i name="EDIFF">      0.00001000</i>
  <i name="EDIFFG">     -0.01000000</i>
  <i type="int" name="NSW">   200</i>
  <i type="int" name="ISIF">     2</i>
  <i type="int" name="ISYM">     2</i>
  <i name="ENCUT">    750.00000000</i>
  <i name="POTIM">      0.30000000</i>
</incar>

till now,I have managed to write code to get Elements as:

#!/usr/bin/env python
import xml.etree.ElementTree as ET

tree = ET.parse("vasprun.xml")
root = tree.getroot()
for child in root:
  print({x for x in root.findall(child.tag)})

which is giving output as:

{<Element 'generator' at 0x7f342220ca90>}
{<Element 'incar' at 0x7f342220cd10>}

I am trying to get the file from incar as:

IStart=0
Prec=accurate

Can someone help me getting this?

1
  • [{n.get("name"): n.text.strip() for n in node} for node in root] Commented Dec 8, 2021 at 19:21

3 Answers 3

2

The below works (XPath)

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modeling>
   <generator>
      <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel</i>
      <i name="platform" type="string">LinuxIFC</i>
      <i name="date" type="string">2019 07 11</i>
      <i name="time" type="string">11:56:12</i>
   </generator>
   <incar>
      <i type="int" name="ISTART">0</i>
      <i type="string" name="PREC">accurate</i>
      <i type="int" name="ISPIN">2</i>
      <i type="int" name="NELMDL">-8</i>
      <i type="int" name="IBRION">2</i>
      <i name="EDIFF">0.00001000</i>
      <i name="EDIFFG">-0.01000000</i>
      <i type="int" name="NSW">200</i>
      <i type="int" name="ISIF">2</i>
      <i type="int" name="ISYM">2</i>
      <i name="ENCUT">750.00000000</i>
      <i name="POTIM">0.30000000</i>
   </incar>
</modeling>'''

root = ET.fromstring(xml)
names = ['ISTART','PREC']
for name in names:
  i = root.find(f'.//i[@name="{name}"]')
  print(i.text)

output

0
accurate
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, but I didn't mean that. I am trying to get all the name=value inside incar. Upvoted though.
0

Added your sample XML to a file after appending the missing final tag </modeling>

Then:

import xml.etree.ElementTree as ET

with open('vasprun.xml') as xml:
    root = ET.fromstring(xml.read())
    for name in ['ISTART', 'PREC']:
        if (t := root.find(f'.//i[@name="{name}"]')) is not None:
            print(f'{name}:{t.text.strip()}')

Comments

0

If the close modeling tag is present, you can use XPath for the job.

xpath to get ISTART value is : //incar/*[@name='ISTART']

xpath to get PREC value is : //incar/*[@name='PREC']

then :


from lxml import etree

xml_doc = """
        <?xml version="1.0" encoding="ISO-8859-1"?>
            <modeling>
                <generator>
                      <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel </i>
                      <i name="platform" type="string">LinuxIFC </i>
                      <i name="date" type="string">2019 07 11 </i>
                      <i name="time" type="string">11:56:12 </i>
                </generator>
                     <incar>
                        <i type="int" name="ISTART">     0</i>
                        <i type="string" name="PREC">accurate</i>
                        <i type="int" name="ISPIN">     2</i>
                        <i type="int" name="NELMDL">    -8</i>
                        <i type="int" name="IBRION">     2</i>
                        <i name="EDIFF">      0.00001000</i>
                        <i name="EDIFFG">     -0.01000000</i>
                        <i type="int" name="NSW">   200</i>
                        <i type="int" name="ISIF">     2</i>
                        <i type="int" name="ISYM">     2</i>
                        <i name="ENCUT">    750.00000000</i>
                        <i name="POTIM">      0.30000000</i>
                     </incar>
            </modeling>
            """
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
xml_tree = etree.fromstring(xml_doc.encode(), parser=parser)
istart = xml_tree.xpath('//incar/*[@name="ISTART"]')
prec = xml_tree.xpath('//incar/*[@name="PREC"]')
print(f'ISTART={int(istart[0].text)}')
print(f'Prec={prec[0].text}')






Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.