1

Edit: others have responded showing xslt as a better solution for the simple problem I have posted here. I have deleted my answer for now.

I've been through about a dozen StackOverflow posts trying to understand how to import an XML document that has namespaces, modify it, and then write it without changing the namespaces. I discovered a few things that weren't clear or had conflicting information. Having finally got it to work I want to record what I learned hoping it helps someone else equally confused. I will put the question here and the answer in a response.

The question: given the sample XML data in the Python docs how do I navigate the tree without having to explicitly include the name-space URIs in the xpaths for findall and write it back out with the namespace prefixes preserved. The example code in the doc does not give the full solution.

Here is the XML data:

<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
    <actor>
        <name>John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>

The desired output is just to add "Sir " in front of the actors' names like this:

<actors xmlns="http://people.example.com" xmlns:fictional="http://characters.example.com">
    <actor>
        <name>Sir John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Sir Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>

3 Answers 3

1

When parsing XML with lxml, the original namespace prefixes are preserved. The following code does what you want. Note the use of {*} as a namespace wildcard.

from lxml import etree
 
tree = etree.parse("data1.xml")
 
for name in tree.findall(".//{*}name"):
    name.text = "Sir " + name.text
Sign up to request clarification or add additional context in comments.

Comments

1

You can simply use XSLT-1.0 to handle the XML:

  1. Create the following XSLT-1.0 file (here named "trans.xslt") and pay attention to the fact that the xmlns=... namespace is defined as xmlns:std=... in the stylesheet element. The Identity template just copies all elements from the input to the output, and the other template adds the string "Sir " before the value of each <name> element.

     <?xml version="1.0" encoding="UTF-8"?>
     <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:std="http://people.example.com">
    
         <!-- Identity template -->
         <xsl:template match="node()|@*">
             <xsl:copy>
                 <xsl:apply-templates select="node()|@*" />
             </xsl:copy>
         </xsl:template>
    
         <xsl:template match="std:name">
           <xsl:copy>
             <xsl:value-of select="concat('Sir ',.)"/>
           </xsl:copy>
         </xsl:template>
    
     </xsl:stylesheet>
    
  2. Call this template from Python:

     import lxml.etree as ET
    
     xml_filename = "input.xml"
     xsl_filename = "trans.xslt"
     out_filename = "output.xml"
    
     dom = ET.parse(xml_filename)
     xslt = ET.parse(xsl_filename)
     transform = ET.XSLT(xslt)
     newdom = transform(dom)
     newdom.write(out_filename, encoding='utf-8', xml_declaration=True)
    

    The last command specifies the output format. The XML declaration with its encoding has to be specified on the write command (and not on a possible .tostring). Output is below:

These two steps generate the following output:

<?xml version='1.0' encoding='UTF-8'?>
<actors xmlns:fictional="http://characters.example.com" xmlns="http://people.example.com">
    <actor>
        <name>Sir John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Sir Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>

Comments

0

I would apply an XSLT to the XML

from lxml import etree

XSL= ‘’’
<xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform" 
   xmlns:people="http://people.example.com">

  <xsl:template match=”@*|node()”>
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    <xsl:copy> 
  </xsl:template>

  <xsl:template match=”people:name/text()”>
    <xsl:text>Sir </xsl:text>
    <xsl:value-of select=”.” />
  </xsl:template>

</xsl:stylesheet>
‘’’
# load input
dom = etree.parse(‘/path/to/my.xml’)
# load XSLT
transform = etree.XSLT(etree.fromstring(XSL))
# apply XSLT on loaded dom
str(transform(dom))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.