Merging XML elements while keeping the contents using python

Question

I've been looing around for a method to remove an element from an XML document,while keeping the contents, using Python, but i haven't been able to find an answer that works.

Basically, i received an XML document in the following format (example):

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

What i have to do is to merge element2 and element3 into element1 such that the output XML document looks like:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

I would appreciate some tips on my (hopefully) simple problem.

Note: I am somewhat new to Python as well, so bear with me.

Do you want to remove element1, or do you want to merge them? — tobias_k
– tobias_k, Commented Aug 1, 2014 at 8:17
First you need to find all between <text></text> tags. Append to array all elements with those values. Then you can create your new XML file. — Fox_01
– Fox_01, Commented Aug 1, 2014 at 8:32

tobias_k · Accepted Answer · 2014-08-01 09:41:58Z

0

This might not be the prettiest of solutions, but since there's no other answer yet...

You could just search for, e.g., </element1><element1> and replace it with the empty string.

xml = """<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>"""

import re
print re.sub(r"\s*</element1>\s*<element1>", "", xml)

Or more generally, re.sub(r"\s*</([a-zA-Z0-9_]+)>\s*<\1>", "", xml) to merge all consecutive instances of the same element, by matching the first element name as a group and then looking for that same group with \1.

Output, in both cases:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

For more complex documents, you might want to use one of Python's many XML libraries instead.

answered Aug 1, 2014 at 9:41

tobias_k

83.1k12 gold badges130 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alex-C Over a year ago

This should work until i find a better way using an XML library. Thank you very much.

Collectives™ on Stack Overflow

Merging XML elements while keeping the contents using python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related