How to read XML file into Pandas Dataframe like Read XML Table in Excel

Question

I have an xml file and I am trying to iterate through the tags to convert it to a pandas dataframe. My current process is to open the XML file with excel as an "XML table" but this takes forever. Trying to find a similar process in Python.

I am trying to follow along with the code presented on numerous other Stack Overflow questions and articles such as here here and here

I believe there are 2 problems I am facing:

Does having the namespace affect my xml?
I don't want to specify all of my tags as seen as a solution in 19.7.1.6. of the Element Tree documentation. I just want all of my tags to appear as a column for each "Security." If it doesn't have that tag it should be null. I also do not want to do a nasty if-else.

The problem is that when I run the code:

import xml.etree.ElementTree as et

etree = et.parse(xml_path)
test = etree.getroot()

and try and iterate as suggested in the above links, I am not able to easily access the child nodes.

Sample File:

<?xml version="1.0"?>
<SecurityInformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://tempuri.org/SecurityInformation.xsd">
    <Security>
        <Country>United States</Country>
    </Security>
</SecurityInformation>

why didn't you update your original question: stackoverflow.com/questions/61732011/… ? You did find the delete link, in that same row there is also an edit link. Please don't delete and re-post questions. That is frowned upon here and might cause you troubles down the road. — rene
– rene, Commented May 11, 2020 at 16:54
@rene i originally had edited the question but once I had edited the question was totally different from what I had originally asked. Is the better behavior to just leave the old (different) question and post a new one? It just seemed more logical to post a new question. — rcwilkin1993
– rcwilkin1993, Commented May 11, 2020 at 17:07
@rene thanks. You don't see them as different now because I'd already changed it. What would you suggest is the best route to get my question answered at this point? — rcwilkin1993
– rcwilkin1993, Commented May 11, 2020 at 17:16
Oh, and are you sure there are namespace attributes on that closing </SecurityInformation> tag? It would be the first time I encounter those. — rene
– rene, Commented May 11, 2020 at 17:23

min · Accepted Answer · 2020-08-25 17:55:51Z

3

I've made a package for similar use case. It could work here too.

pip install pandas_read_xml

you can do something like

import pandas_read_xml as pdx

df = pdx.read_xml('filename.xml', ['SecurityInformation'])

To flatten, you could

df = pdx.flatten(df)

or

df = pdx.fully_flatten(df)

answered Aug 25, 2020 at 17:55

min

2412 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to read XML file into Pandas Dataframe like Read XML Table in Excel

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related