1

I have to remove some specific tag in apache-tomcat web.xml files

web.xml

    <?xml version="1.0" encoding="ISO-8859-1"?>



<web-app xmlns="http://java.sun.com/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
  version="3.0">

  <!-- ======================== Introduction ============================== -->
  <!-- This document defines default values for *all* web applications      -->
  <!-- loaded into this instance of Tomcat.  As each application is         -->
  <!-- deployed, this file is processed, followed by the                    -->
  <!-- "/WEB-INF/web.xml" deployment descriptor from your own               -->
  <!-- applications.                                                        -->
  <!--                                                                      -->
  <!-- WARNING:  Do not configure application-specific resources here!      -->
  <!-- They should go in the "/WEB-INF/web.xml" file in your application.   -->

     <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
   <servlet>
        <servlet-name>jsp</servlet-name>
        <servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
        <init-param>
            <param-name>fork</param-name>
            <param-value>false</param-value>
        </init-param>
        <init-param>
            <param-name>xpoweredBy</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>3</load-on-startup>
    </servlet>

    <servlet>
        <servlet-name>cgi</servlet-name>
        <servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>
        <init-param>
          <param-name>debug</param-name>
          <param-value>0</param-value>
        </init-param>
        <init-param>
          <param-name>cgiPathPrefix</param-name>
          <param-value>WEB-INF/cgi</param-value>
        </init-param>
         <load-on-startup>5</load-on-startup>
    </servlet>
</<web-app>

if servlet-name== cgi i need to remove entier servlet tag. my code as follows:

    from xml.etree.ElementTree import ElementTree
    tree = ElementTree()
    tree.parse('web.xml')
    servlets = tree.findall('servlet')
    print "servlets : ",servlets
    for servlet in servlets:
      servlet_names = foo.findall('servlet-name')
      for servlet_name  in servlet_names:
            if servlet_name == "cgi" :
                    print "servlet_name :", servlet_name
                    servlet.remove(servlet-name)

I am getting o/p as servlets : [] instead of all servlets and unable to enter the for loop. Can any one help me ?.

I am not getting Any exception

#!/usr/bin/python
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                print "removed the cgi serverlet", root.remove(servlet)

=====output=============== servlets : [http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b35a8>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3878>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3bd8>] servlet_name : cgi removed the cgi serverlet None

==== i have used pdb tracer to find out the element(servlet) value its shwoing as \n..

> /apps/manu/python/manunamespace.py(10)<module>()
-> servlet_name=servlet.find('{http://java.sun.com/xml/ns/javaee}servlet-name')
(Pdb) servlet_name
<Element {http://java.sun.com/xml/ns/javaee}servlet-name at 882878>
(Pdb) servlet_name.text
'jsp'
(Pdb) n
> /apps/manu/python/manunamespace.py(11)<module>()
-> print "servlet_name:", servlet_name.text
(Pdb) servlet_name.text
'cgi'
(Pdb) servlet.text
'\n        '
(Pdb) n
servlet_name: cgi
> /apps/manu/python/manunamespace.py(12)<module>()
-> if servlet_name.text == "cgi":
(Pdb) n
> /apps/manu/python/manunamespace.py(13)<module>()
-> print "remove the element"
(Pdb) n
remove the element
> /apps/manu/python/manunamespace.py(14)<module>()
-> print "remove : ",root.remove(servlet)
(Pdb) servlet
<Element {http://java.sun.com/xml/ns/javaee}servlet at 882d88>
(Pdb) servlet.text
'\n 

   '
2
  • You're just not printing the changed tree, only the result of root.remove, which is None. The code works, you just need to do something with the tree after modifying it. To print it you can use tree.write(sys.stdout), or treee.write(open('filename', 'w')) to write it to a file. Commented Sep 5, 2015 at 9:03
  • got it and its working now,but its unable to write name space and excluding all comments in the file when i write into the file. Commented Sep 13, 2015 at 11:27

2 Answers 2

1

This is failing:

servlets = tree.findall('servlet')

Because there are no servlet elements in your document. The root element specifies:

xmlns="http://java.sun.com/xml/ns/javaee"

Which means that all elements, unless otherwise specified, are in this XML namespace. So you want:

>>> tree.findall('{http://java.sun.com/xml/ns/javaee}servlet')
[<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec681b8>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec68200>, 
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec682d8>]
>>> 
Sign up to request clarification or add additional context in comments.

Comments

1

You are not finding the tags you are searching for because they are in the default namespace (http://java.sun.com/xml/ns/javaee).

Also if you want to test an elements content, you need to use its text attribute, not compare to the element itself. If it matches, you need to remove the servlet-tag from the root, not the servlet-name tag from the servlet.

Try this:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
servlets = root.findall('jee:servlet', nsmap)
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall('jee:servlet-name', nsmap)
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

Or using the supported xpath syntax more efficiently:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
for servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap):
    root.remove(servlet)

Edit: For older python versions (tested with python2.5):

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

8 Comments

Able to see the below exception for two scenarios: 1) for servlet in root.findall('jee:servlet', nsmap): TypeError: findall() takes exactly 2 arguments (3 given) Scanario 2) :r servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap): . can you suggest what might be the issue
Below code is working but unable to remove the element. import xml.etree.ElementTree as ET tree = ET.parse('web.xml') root = tree.getroot() #root = ET.fromstring('web.xml') for servlet in root.findall('{java.sun.com/xml/ns/javaee}servlet'): print "servlet : ", servlet.text servlet_name=servlet.find('{java.sun.com/xml/ns/javaee}servlet-name') print "servlet_name:", servlet_name.text if servlet_name.text == "cgi": print "remove the element" root.remove(servlet) print "servlet ", servlet.text
you're probably using an outdated version of python (2.6 or older), wich didn't support namespace prefixes or extended syntax for findall... if you can't update python, use the first variant with the full namespace syntax (replace jee: with {http://java.sun.com/xml/ns/javaee})
Okay Mata. But i am unable to delete the element as defined like above.
Of course you also need to remove the nsmap argument. Added the full example working on python2.5
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.