4

I am new in here and I have been trying to create a small python script to convert xml to csv. Based on my reading various post here in Stackoverflow I have managed to come up with a sample code that works just fine.. However the data I am trying to work with has multiple layers and thus I am unsure how to extract the data at the leaf level.

Given below is how the data looks like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transmission>
    <TransmissionBody>
        <level1>
            <level2>
                <level3>
                    <level4>
                        <level5>
                            <level6>
                                <ColA>ABC</ColA>
                                <ColB>123</ColB>
                            </level6>
                        </level5>
                    </level4>
                </level3>
            </level2>
        </level1>
    </TransmissionBody>
</Transmission>

I am trying to use the below code to try converting the xml to csv

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
final = {}
for elem in root:
    if len(elem):
        for c in elem.getchildren():
            final[c.tag] = c.text
    else:
        final[elem.tag] = elem.text

df = pd.DataFrame([final])
df.to_csv('file.csv)

This code however just pulls level2 and not ColA from level6.

Expected Output:

Transmission,TransmissionBody,level1,level2,level3,level4,level5,level6,ColA,ColB
,,,,,,,,ABC,123
,,,,,,,,DEF,456

Updated code:

allFiles = glob.glob(folder)
for file in allFiles:
    xmllist = [file]
    for xmlfile in xmllist:
        tree = ET.parse(xmlfile)
        root = tree.getroot()

        def f(elem, result):
            result[elem.tag] = elem.text
            cs = elem.getchildren()
            for c in cs:
                result = f(c, result)
            return result

         d = f(root, {})
         df = pd.DataFrame(d, index=['values'])
0

2 Answers 2

2

you have an indentation issue on

if len(elem):

I guess this should resolve it.

Sign up to request clarification or add additional context in comments.

2 Comments

if len(elem): is on the baseline, what kind of indentation do you suggest on that line?
What makes you think the if () line is on the baseline? To me it looks to be part of the for elem in root: loop, so it should be indented.
1

If I understood your question correctly, you need to traverse the XML tree, so you probably want to have a recursive function that does that. Something like the following:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

d = f(root, {})
df = pd.DataFrame(d, index=['values']).T
df

Out:

    values
Transmission    \n
TransmissionBody    \n
level1  \n
level2  \n
level3  \n
level4  \n
level5  \n
level6  \n
ColA    ABC
ColB    123

Update: Here's when we need to do it on multiple XML files. I've added another file similar to the original one with ColA, ColB rows replaced with

<ColA>DEF</ColA>
<ColB>456</ColD>

Here's the code:

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

result = {}
for file in glob.glob('*.xml'):
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, result)

df = pd.DataFrame(result, index=['values']).T
df

And the output:

                    0    1
Transmission       \n   \n
TransmissionBody   \n   \n
level1             \n   \n
level2             \n   \n
level3             \n   \n
level4             \n   \n
level5             \n   \n
level6             \n   \n
ColA              ABC  DEF
ColB              123  456

1 Comment

A doubt: Why are you transposing it? The requirement was to create a csv with headers on the top row, unlike headers on the first column, isn't?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.