Python, XML Structure to Dictionary

Question

I have an xml file as follows:

<?xml version="1.0"?>
<max:SyncObject xmlns:max="http://www.ibm.com/max">
  <max:ObjectSet>
    <max:PARENT action="AddChange">
      <max:FIELD1>string</max:FIELD1>
      <max:FIELD2>string</max:FIELD2>
      <max:FIELD3>string</max:FIELD3>
      <max:FIELD4>string</max:FIELD4>
      <max:FIELD5>string</max:FIELD5>
      <max:FIELD6>string</max:FIELD6>
      <max:FIELD7>string</max:FIELD7>
      <max:CHILD1 action="Ignored">
        <max:CH1FIELD1 action="Ignored">
          <max:CH1SUB1>string</max:CH1SUB1>
        <max:CH1FIELD2>string</max:CH1FIELD2>
      </max:CHILD1>
      <max:CHILD2 action="Ignored">
        <max:CH2FIELD1>string</max:CH2FIELD1>
      </max:CHILD2>
    </max:PARENT>
  </max:ObjectSet>
</max:SyncObject>

and my end result that I want to achieve is as follows:

{'PARENT': ['FIELD1', 'FIELD2', 'FIELD3', 'FIELD4', 'FILED5', 'FIELD6', 'FIELD7', 'CHILD1', 'CHILD2']}, {'CHILD1': ['CH1FIELD1', 'CH1FIELD2'], 'CHILD2': ['CH2FIELD1'], 'CH1FIELD1':['CH1SUB1']}

So I have tried several different methods of extracting the FIELD1, FIELD2... tags from the XML file while still maintaining the structure, as you can see the PARENT dictionary is separate from the rest and contains all tags exactly one level below. This is also true for the children tags. The action attrib is not needed as this will be specified by another means within the class.

It seems that most lxml and elementtree are geared toward extracting the attributes from the XML tags and not the tags themselves.

Could anyone point me in the correct direction of extracting the tag (FIELD NAMES) without the prefix, value, or any attributes and preserve the structure?

THANKS!

alecxe · Accepted Answer · 2015-02-23 21:04:16Z

1

First of all, your XML data is not well-formed, there is a missing closing </max:CH1FIELD1>.

To convert it to a python data structure, use xmltodict:

import xmltodict

data = """<?xml version="1.0"?>
<max:SyncObject xmlns:max="http://www.ibm.com/max">
  <max:ObjectSet>
    <max:PARENT action="AddChange">
      <max:FIELD1>string</max:FIELD1>
      <max:FIELD2>string</max:FIELD2>
      <max:FIELD3>string</max:FIELD3>
      <max:FIELD4>string</max:FIELD4>
      <max:FIELD5>string</max:FIELD5>
      <max:FIELD6>string</max:FIELD6>
      <max:FIELD7>string</max:FIELD7>
      <max:CHILD1 action="Ignored">
        <max:CH1FIELD1 action="Ignored">
          <max:CH1SUB1>string</max:CH1SUB1>
        <max:CH1FIELD2>string</max:CH1FIELD2>
        </max:CH1FIELD1>
      </max:CHILD1>
      <max:CHILD2 action="Ignored">
        <max:CH2FIELD1>string</max:CH2FIELD1>
      </max:CHILD2>
    </max:PARENT>
  </max:ObjectSet>
</max:SyncObject>"""

d = xmltodict.parse(data, 
                    process_namespaces=True, 
                    namespaces={'http://www.ibm.com/max': None})
print d

answered Feb 23, 2015 at 21:04

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

txUTSA Over a year ago

Thanks, can you explain what this returns and how to manipulate it?

alecxe Over a year ago

@txDMTN sure, it returns an OrderedDict structure which you can basically manipulate with as a normal dict, but it preserves the order.

Collectives™ on Stack Overflow

Python, XML Structure to Dictionary

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related