2

I have the folowing test.xml

<root>
<parent>
    <ID>1</ID>
    <child1>Value1</child1>
    <child2>value11</child2>
    <child3>
       <subchild>value111</subchild>
    </child3>
</parent>
<parent>
    <ID>2</ID>
    <child1>value2</child1>
    <child2>value22</child2>
    <child2>value333</child2>
</parent>
<parent>
    <ID>3</ID>
    <child1>value3</child1>
    <child2>value33</child2>
</parent>
<parent>
    <ID>4</ID>
    <child1>value4</child1>
    <child2>value44</child2>
</parent>
</root>

What Im trying to accomplish is the following: I want to iterate through the test.xml and for every parent I want to put all of the child nodes in a dictionary where the tag is the index and the text is the value and once i get to the end of the parent add that to the database and reset the dictionary and move onto the next parent.

So for the first parent I would want

    insert = {'ID':1,'child1':'value1','child2':'value11','subchild':'value111'}

Use it in an SQL query, And then move onto the next parent reset the dictionary and do the same thing. Not every parent has the same amount of children, and some children have sub children.

I have tried with:

    value = []
    tag = []

    from elementtree import ElementTree as ET
    for parent in tree.getiterator():
        for child in parent:
             value.append(child.text)
             tag.append(child.tag)

But I couldn't figure out how to get my desired results. I left out retrieving and opening the xml in order to keep the post as simple as possible. This is the method I was attempting to use but I don't think its the right one because I haven't been able to stop the iteration at the end of the parent tag in order to insert.

Any help would be greatly appreciated! thanks

2 Answers 2

2

Try this using the lxml library:

from lxml import etree

source = """
<root>
<parent>
    <ID>1</ID>
    <child1>Value1</child1>
    <child2>value11</child2>
    <child3>
       <subchild>value111</subchild>
    </child3>
</parent>
<parent>
    <ID>2</ID>
    <child1>value2</child1>
    <child2>value22</child2>
    <child2>value333</child2>
</parent>
<parent>
    <ID>3</ID>
    <child1>value3</child1>
    <child2>value33</child2>
</parent>
<parent>
    <ID>4</ID>
    <child1>value4</child1>
    <child2>value44</child2>
</parent>
</root>
"""

document = etree.fromstring(source)
inserts = []

id_number = 3

for parent in document.findall('parent'):
    insert = {}
    cont = 0
    for element in parent.iterdescendants():
        if element.tag == 'ID':
            if element.text == str(id_number):
                cont = 1
        if element.getchildren() == []:
            insert[element.tag] = element.text
    if cont:
        inserts.append(insert)

print inserts
Sign up to request clarification or add additional context in comments.

1 Comment

Lets say I only wanted the parent node where the ID is 3. How would I accomplish this? When I say only want the parent node I mean with all the info in it that the above code extracted as well. But only if the ID is 3
0

There is also an etree API shipped with python (it does not have pretty printing and some other features that lxml has though): http://docs.python.org/library/xml.etree.elementtree.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.