1

I have an xml file and I am trying to iterate through the tags to convert it to a pandas dataframe. My current process is to open the XML file with excel as an "XML table" but this takes forever. Trying to find a similar process in Python.

I am trying to follow along with the code presented on numerous other Stack Overflow questions and articles such as here here and here

I believe there are 2 problems I am facing:

  1. Does having the namespace affect my xml?

  2. I don't want to specify all of my tags as seen as a solution in 19.7.1.6. of the Element Tree documentation. I just want all of my tags to appear as a column for each "Security." If it doesn't have that tag it should be null. I also do not want to do a nasty if-else.

The problem is that when I run the code:

import xml.etree.ElementTree as et

etree = et.parse(xml_path)
test = etree.getroot()

and try and iterate as suggested in the above links, I am not able to easily access the child nodes.

Sample File:

<?xml version="1.0"?>
<SecurityInformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://tempuri.org/SecurityInformation.xsd">
    <Security>
        <Country>United States</Country>
    </Security>
</SecurityInformation> 
8
  • 2
    why didn't you update your original question: stackoverflow.com/questions/61732011/… ? You did find the delete link, in that same row there is also an edit link. Please don't delete and re-post questions. That is frowned upon here and might cause you troubles down the road. Commented May 11, 2020 at 16:54
  • @rene i originally had edited the question but once I had edited the question was totally different from what I had originally asked. Is the better behavior to just leave the old (different) question and post a new one? It just seemed more logical to post a new question. Commented May 11, 2020 at 17:07
  • @rene thanks. You don't see them as different now because I'd already changed it. What would you suggest is the best route to get my question answered at this point? Commented May 11, 2020 at 17:16
  • Practice patience ... Commented May 11, 2020 at 17:21
  • Oh, and are you sure there are namespace attributes on that closing </SecurityInformation> tag? It would be the first time I encounter those. Commented May 11, 2020 at 17:23

1 Answer 1

3

I've made a package for similar use case. It could work here too.

pip install pandas_read_xml

you can do something like

import pandas_read_xml as pdx

df = pdx.read_xml('filename.xml', ['SecurityInformation'])

To flatten, you could

df = pdx.flatten(df)

or

df = pdx.fully_flatten(df)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.