How to Add Prefix in XML in Python

Question

I have an xml like this

<eo-gateway>
<interface-code>AAA</interface-code>
<supplier-code>XXX</supplier-code>
<authors>
<author type="first">
<forename>John</forename>
<surname>Smith</surname>
</author>
</authors>
</eo-gateway>

I need to arrive to this kind of xml adding prefix "art" in each tag.

<art:eo-gateway>
<art:interface-code>AAA</art:interface-code>
<art:supplier-code>XXX</art:supplier-code>
<art:authors>
<art:author type="first">
<art:forename>John</art:forename>
<art:surname>Smith</art:surname>
</art:author>
</art:authors>
</art:eo-gateway>

Thanks for you help.

You target XML is invalid. art: would be an namespace prefix and need to be defined - <art:eo-gateway xmlns:art="urn:art">. But this has the same meaning like <eo-gateway xmlns="urn:art">. — ThW
– ThW, Commented Nov 7, 2014 at 9:24

Vincent Beltman · Accepted Answer · 2014-11-07 09:23:15Z

1

Use beautifulsoup :http://www.crummy.com/software/BeautifulSoup/bs4/doc/

from bs4 import BeautifulSoup
soup = BeautifulSoup('''<eo-gateway>
<interface-code>AAA</interface-code>
<supplier-code>XXX</supplier-code>
<authors>
<author type="first">
<forename>John</forename>
<surname>Smith</surname>
</author>
</authors>
</eo-gateway>''')

for i in soup.find_all():
    i.name = 'art:' + i.name

And if you don't want some tags you could do this:

except_these = ['art:body', 'art:html']

for i in soup.find_all():
    name = i.name
    if name not in except_these:
        i.name = 'art:' + i.name
print soup

Output:

<art:body>
<art:eo-gateway>
<art:interface-code>AAA</art:interface-code>
<art:supplier-code>XXX</art:supplier-code>
<art:authors>
<art:author type="first">
<art:forename>John</art:forename>
<art:surname>Smith</art:surname>
</art:author>
</art:authors>
</art:eo-gateway>
</art:body>

Or you could even check whether it already has 'art:' in front of it:

if !name.startswith('art:'):

edited Nov 7, 2014 at 9:23

answered Nov 7, 2014 at 8:44

Vincent Beltman

2,11216 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Arbin Bulaybulay Over a year ago

Thanks Vincent. This works however <art:html><art:body></art:body></art:html> are auto generated and added to the tags. How exclude them? Thanks

Arbin Bulaybulay Over a year ago

I've tried above resolution but still got <art:html><art:body></art:body></art:html>. I'm also confused with the output you've provided since there is still <art:body> even if it is excluded under except_these.

Vincent Beltman Over a year ago

Do you want to delete these tags?

Arbin Bulaybulay · Accepted Answer · 2014-11-08 04:20:46Z

0

I've already found the correct way to solve this using xml.etree that eliminates having extra tags like html and body tags from BeautifulSoup.

from xml.etree import ElementTree as etree

xml_content = """<eo-gateway>
    <interface-code>AAA</interface-code>
    <supplier-code>XXX</supplier-code>
    <authors>
    <author type="first">
    <forename>John</forename>
    <surname>Smith</surname>
    </author>
    </authors>
    </eo-gateway>"""

document_root = etree.ElementTree(etree.fromstring(xml_content))
for element in document_root.getiterator():
    element.tag = "art:" + element.tag
print etree.tostring(document_root.getroot())

answered Nov 8, 2014 at 4:20

Arbin Bulaybulay

398 bronze badges

Collectives™ on Stack Overflow

How to Add Prefix in XML in Python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related