2

I need to get the elements from xml as a string. I am trying with below xml format.

<xml>
    <prot:data xmlns:prot="prot">
        <product-id-template>
            <prot:ProductId>PRODUCT_ID</prot:ProductId>
        </product-id-template>

        <product-name-template>
            <prot:ProductName>PRODUCT_NAME</prot:ProductName>
        </product-name-template>

        <dealer-template>
            <xsi:Dealer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">DEALER</xsi:Dealer>
        </dealer-template>
    </prot:data>
</xml>

And I tried with below code:

from xml.etree import ElementTree as ET

def get_template(xpath, namespaces):   
    tree = ET.parse('cdata.xml')
    elements = tree.getroot()
    for element in elements.findall(xpath, namespaces=namespaces):
        return element

namespace = {"prot" : "prot"}
aa = get_template(".//prot:ProductId", namespace)
print(ET.tostring(aa).decode())

Actual output:

<ns0:ProductId xmlns:ns0="prot">PRODUCT_ID</ns0:ProductId>

Expected output:

<prot:ProductId>PRODUCT_ID</prot:ProductId>

I should not remove the xmlns from the document where it presents in the document. And It has to be removed where it not presents. Example product-id-template is not containing the xmlns so it needs to be retrieved without xmlns. And dealer-template contains the xmlns so it needs to be retrieved with xmlns.

How to achieve this?

1 Answer 1

1

You can remove xmlns with regex.

import re
# ...
with_ns = ET.tostring(aa).decode()
no_ns = re.sub(' xmlns(:\w+)?="[^"]+"', '', with_ns)
print(no_ns)

UPDATE: You can do a very wild thing. Although I can't recommend it, because I'm not a Python expert.

I just checked the source code and found that I can do this hack:

def my_serialize_xml(write, elem, qnames, namespaces,
                     short_empty_elements, **kwargs):
    ET._serialize_xml(write, elem, qnames,
                      None, short_empty_elements, **kwargs)

ET._serialize["xml"] = my_serialize_xml

I just defined my_serialize_xml, which calls ElementTree._serialize_xml with namespaces=None. And then, in dictionary ElementTree._serialize, I changed value for key "xml" to my_serialize_xml. So when you call ElementTree.tostring, it will use my_serialize_xml.

If you want to try it, just place the code(above) after from xml.etree import ElementTree as ET (but before using the ET).

Sign up to request clarification or add additional context in comments.

5 Comments

Can we achieve this without using regex? Because in some case element may have different xmlns in that time also it will remove right. I need to retrieve as such how the xml document contains the data @Mike Kaskun
I tried to do it with ElementTree.tostring(), but seems it is not possible. Maybe lxml can do it. I'll let you know if I find better solution.
I've updated the answer, can't find other ways with xml.etree. If you can use lxml, you can find similar q&a how to do it.
Thanks @Mike Kaskun. But it removes the xmlns in all the place. I should not remove the xmlns from the document where it presents in the document. Above question I added the dealer-template block. There I need the xmlns and it should not be removed from there. And It has to be removed where it not presents
Ok, I understand. I'll try something else later. Also, add your explanation to question, maybe someone else will help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.