0

I have an xml like this

<eo-gateway>
<interface-code>AAA</interface-code>
<supplier-code>XXX</supplier-code>
<authors>
<author type="first">
<forename>John</forename>
<surname>Smith</surname>
</author>
</authors>
</eo-gateway>

I need to arrive to this kind of xml adding prefix "art" in each tag.

<art:eo-gateway>
<art:interface-code>AAA</art:interface-code>
<art:supplier-code>XXX</art:supplier-code>
<art:authors>
<art:author type="first">
<art:forename>John</art:forename>
<art:surname>Smith</art:surname>
</art:author>
</art:authors>
</art:eo-gateway>

Thanks for you help.

1
  • You target XML is invalid. art: would be an namespace prefix and need to be defined - <art:eo-gateway xmlns:art="urn:art">. But this has the same meaning like <eo-gateway xmlns="urn:art">. Commented Nov 7, 2014 at 9:24

2 Answers 2

1

Use beautifulsoup :http://www.crummy.com/software/BeautifulSoup/bs4/doc/

from bs4 import BeautifulSoup
soup = BeautifulSoup('''<eo-gateway>
<interface-code>AAA</interface-code>
<supplier-code>XXX</supplier-code>
<authors>
<author type="first">
<forename>John</forename>
<surname>Smith</surname>
</author>
</authors>
</eo-gateway>''')

for i in soup.find_all():
    i.name = 'art:' + i.name

And if you don't want some tags you could do this:

except_these = ['art:body', 'art:html']

for i in soup.find_all():
    name = i.name
    if name not in except_these:
        i.name = 'art:' + i.name
print soup

Output:

<art:body>
<art:eo-gateway>
<art:interface-code>AAA</art:interface-code>
<art:supplier-code>XXX</art:supplier-code>
<art:authors>
<art:author type="first">
<art:forename>John</art:forename>
<art:surname>Smith</art:surname>
</art:author>
</art:authors>
</art:eo-gateway>
</art:body>

Or you could even check whether it already has 'art:' in front of it:

if !name.startswith('art:'):
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Vincent. This works however <art:html><art:body></art:body></art:html> are auto generated and added to the tags. How exclude them? Thanks
I've tried above resolution but still got <art:html><art:body></art:body></art:html>. I'm also confused with the output you've provided since there is still <art:body> even if it is excluded under except_these.
Do you want to delete these tags?
0

I've already found the correct way to solve this using xml.etree that eliminates having extra tags like html and body tags from BeautifulSoup.

from xml.etree import ElementTree as etree

xml_content = """<eo-gateway>
    <interface-code>AAA</interface-code>
    <supplier-code>XXX</supplier-code>
    <authors>
    <author type="first">
    <forename>John</forename>
    <surname>Smith</surname>
    </author>
    </authors>
    </eo-gateway>"""

document_root = etree.ElementTree(etree.fromstring(xml_content))
for element in document_root.getiterator():
    element.tag = "art:" + element.tag
print etree.tostring(document_root.getroot())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.