2

I've been looing around for a method to remove an element from an XML document,while keeping the contents, using Python, but i haven't been able to find an answer that works.

Basically, i received an XML document in the following format (example):

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

What i have to do is to merge element2 and element3 into element1 such that the output XML document looks like:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

I would appreciate some tips on my (hopefully) simple problem.

Note: I am somewhat new to Python as well, so bear with me.

4
  • Do you want to remove element1, or do you want to merge them? Commented Aug 1, 2014 at 8:17
  • 1
    Merge them, my bad. Edited the main post as well. Commented Aug 1, 2014 at 8:18
  • stackoverflow.com/questions/15921642/… Commented Aug 1, 2014 at 8:22
  • First you need to find all between <text></text> tags. Append to array all elements with those values. Then you can create your new XML file. Commented Aug 1, 2014 at 8:32

1 Answer 1

0

This might not be the prettiest of solutions, but since there's no other answer yet...

You could just search for, e.g., </element1><element1> and replace it with the empty string.

xml = """<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>"""

import re
print re.sub(r"\s*</element1>\s*<element1>", "", xml)

Or more generally, re.sub(r"\s*</([a-zA-Z0-9_]+)>\s*<\1>", "", xml) to merge all consecutive instances of the same element, by matching the first element name as a group and then looking for that same group with \1.

Output, in both cases:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

For more complex documents, you might want to use one of Python's many XML libraries instead.

Sign up to request clarification or add additional context in comments.

1 Comment

This should work until i find a better way using an XML library. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.