0

I am trying to extract the following values from an xml file: NAME, Mode,LEVELS,Group,Type and after I want to make data.frame. The problem I having so far is that I cannot get <Name>ALICE</Name> variables and output data.frame format is different than I need.

Here is the some post that I used when I built my read_xml function

  1. https://www.geeksforgeeks.org/xml-parsing-python/
  2. Extracting text from XML using python
  3. How do I parse XML in Python?

here is the example xml file format

<?xml version="1.0"?>
<Body>
    <DocType>List</DocType>
    <DocVersion>1</DocVersion>
    <LIST>
            <Name>ALICE</Name>
            <Variable>
                <Mode>Hole</Mode>
                <LEVELS>1</LEVELS>
                <Group>11</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
            <Variable>
                <Mode>BEWEL</Mode>
                <LEVELS>2</LEVELS>
                <Group>22</Group>
                <Type>0</Type>
                <Paint />
            </Variable>

            <Name>WONDERLAND</Name>
            <Variable>
                <Mode>Mole</Mode>
                <LEVELS>1</LEVELS>
                <Group>11</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
            <Variable>
                <Mode>Matrix</Mode>
                <LEVELS>6</LEVELS>
                <Group>66</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
    </LIST>
</Body>

I built the following function;

xml_file = r"C:\xml.xml"

def read_xml(xml_file):

   etree = ET.parse(xml_file)
   root = etree.getroot()
   items = []
   for item in root.findall('./LIST/'):
      values  = {}
      for it in item:
         #print(it)
        values[it.tag] = it.text
      items.append(values)

   columns = ['Name','Mode', 'LEVELS','Group','Type']
   df = pd.DataFrame(items, columns = columns)

   return df


    print(read_xml(xml_file))

giving me this output

 Name    Mode LEVELS Group Type
0   NaN     NaN    NaN   NaN  NaN
1   NaN    Hole      1    11    0
2   NaN   BEWEL      2    22    0
3   NaN     NaN    NaN   NaN  NaN
4   NaN    Mole      1    11    0
5   NaN  Matrix      6    66    0

the expected output

            NAME      MODE   LEVELS  Group  Type
1            ALICE      Hole      1      11     0
2            ALICE      BEWEL     11     22     0      
3       WONDERLAND      MOLE      1      11     0      
4       WONDERLAND      MATRIX    6      66     0      

How can I get the expected output!!

Thx!

2 Answers 2

1

If tag is Name in loop then set to variable and last add to dictionary values:

import xml.etree.cElementTree as ET
def read_xml(xml_file):

   etree = ET.parse(xml_file)
   root = etree.getroot()
   items = []
   for item in root.findall('LIST/'):
       values = {}
       if (item.tag == 'Name'):
           name = item.text
           continue
       for it in item:
           values[it.tag] = it.text
       values['Name'] = name
       items.append(values)

   columns = ['Name','Mode', 'LEVELS','Group','Type']
   df = pd.DataFrame(items, columns = columns)

   return df

xml_file = 'xml.xml'
print(read_xml(xml_file))
         Name    Mode LEVELS Group Type
0       ALICE    Hole      1    11    0
1       ALICE   BEWEL      2    22    0
2  WONDERLAND    Mole      1    11    0
3  WONDERLAND  Matrix      6    66    0
Sign up to request clarification or add additional context in comments.

Comments

0
import xml.etree.ElementTree as ET

import pandas as pd

def xml_to_df(xml_file):

  tree = ET.parse(xml_file)

  root = tree.getroot()

  data = []
  for child in root:
    record = {}
    for subchild in child:
      record[subchild.tag] = subchild.text
    data.append(record)

  df = pd.DataFrame(data)
  return df

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.