18

I am parsing an xml file generated by an external program. I would then like to add custom annotations to this file, using my own namespace. My input looks as below:

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
  <model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
</sbml>

The issue being that lxml only declares namespaces when they are used, which means the declaration is repeated many times, like so (simplified):

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4">
  <listOfSpecies>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
    </species>
    ....
  </listOfSpecies>
</sbml>

Is it possible to force lxml to write this declaration only once in a parent element, such as sbml or listOfSpecies? Or is there a good reason not to do so? The result I want would be:

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"  xmlns:kjw="http://this.is.some/custom_namespace">
  <listOfSpecies>
    <species>
      <kjw:test/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test/>
    </species>
    ....
  </listOfSpecies>
</sbml>

The important problem is that the existing data which is read from a file must be kept, so I cannot just make a new root element (I think?).

EDIT: Code attached below.

def annotateSbml(sbml_input):
  from lxml import etree

  checkSbml(sbml_input) # Makes sure the input is valid sbml/xml.

  ns = "http://this.is.some/custom_namespace"
  etree.register_namespace('kjw', ns)

  sbml_doc = etree.ElementTree()
  root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True))
  nsmap = root.nsmap
  nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this?
  nsmap['kjw'] = ns
  ns = '{' + ns + '}'
  sbmlns = '{' + nsmap['sbml'] + '}'

  for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap):
    species.append(etree.Element(ns + 'test'))

  sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True)

  return
3
  • @Marcin: done. Any tips? Commented Jul 5, 2012 at 16:06
  • @mzjin my input is contains everything except the <kjw:test/> tags. The aim is to insert such tags (or similar, e.g. kjw:score or kjw:length) to each species in this list. Does this make sense, or should I post the whole file (figured my original question was long enough as it is)? Commented Jul 5, 2012 at 16:30
  • @mzjin Ah sorry, oversimplified that a bit. Yes it does indeed contain model tags. I've used the sbml:model tags together with nsmap['sbml'] = nsmap[None] so the parser properly substitutes the namespace in model with the root namespace, which it doesn't seem to otherwise. Commented Jul 5, 2012 at 17:25

6 Answers 6

14

Modifying the namespace mapping of a node is not possible in lxml. See this open ticket that has this feature as a wishlist item.

It originated from this thread on the lxml mailing list, where a workaround replacing the root node is given as an alternative. There are some issues with replacing the root node though: see the ticket above.

I'll put the suggested root replacement workaround code here for completeness:

>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
...   <model metaid="untitled" id="untitled">
...     <annotation>...</annotation>
...     <listOfUnitDefinitions>...</listOfUnitDefinitions>
...     <listOfCompartments>...</listOfCompartments>
...     <listOfSpecies>
...       <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
...         <annotation>
...           <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...       <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
...         <annotation>
...            <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...     </listOfSpecies>
...     <listOfReactions>...</listOfReactions>
...   </model>
... </sbml>"""
>>> 
>>> from lxml import etree
>>> from StringIO import StringIO
>>> NS = "http://this.is.some/custom_namespace"
>>> tree = etree.ElementTree(element=None, file=StringIO(DOC))
>>> root = tree.getroot()
>>> nsmap = root.nsmap
>>> nsmap['kjw'] = NS
>>> new_root = etree.Element(root.tag, nsmap=nsmap)
>>> new_root[:] = root[:]
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))

>>> print etree.tostring(new_root, pretty_print=True)
<sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
<kjw:test/><kjw:test/></sbml>
Sign up to request clarification or add additional context in comments.

2 Comments

For future reference this requires a small alteration (on Python 3.2 at least), otherwise gives a TypeError from **root.nsmap when it hits the None:'namespace' as None is not a string. Using nsmap = root.nsmap; nsmap['kjw'] = NS; new_root = etree.Element(root.tag, nsmap = nsmap); works.
you also need to copy attrib, text, and (unlikely, but just for completness) tail. nsmap=dict(kjw=NS, nsmap=nsmap)) is wrong; it should be just nsmap=nsmap
8

I know this is old question, but it still valid and as of lxml 3.5.0, there is probably better solution to this problem:

cleanup_namespaces() accepts a new argument top_nsmap that moves definitions of the provided prefix-namespace mapping to the top of the tree.

So now the namespace map can be moved up with simple call to this:

nsmap = {'kjw': 'http://this.is.some/custom_namespace'}
etree.cleanup_namespaces(root, top_nsmap=nsmap)

Comments

3

Rather than dealing directly with the raw XML you could also look toward LibSBML, a library for manipulating SBML documents with language bindings for, among others, python. There you would use it like this:

>>> from libsbml import *
>>> doc = readSBML('Dropbox/SBML Models/BorisEJB.xml')
>>> species = doc.getModel().getSpecies('MAPK')
>>> species.appendAnnotation('<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>')
0
>>> species.toSBML()
'<species id="MAPK" compartment="compartment" initialConcentration="280" boundaryCondition="false">\n  <annotation>\n
 <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>\n  </annotation>\n</species>'
>>>

Comments

2

I wrote this function to add a namespace to the root element:

def addns(tree, alias, uri):                
    root = tree.getroot()
    nsmap = root.nsmap
    nsmap[alias] = uri
    new_root = etree.Element(root.tag, attrib=root.attrib, nsmap=nsmap)
    new_root[:] = root[:]
    return new_root.getroottree()

After applying this function, you get a new tree, but you can probably change the tree instance from the single objet from which you access the tree ... as you have a strong OO design!.

Comments

1

If you temporarily add a namespaced attribute to the root node, that does the trick.

ns = '{http://this.is.some/custom_namespace}'

# add 'kjw:foobar' attribute to root node
root.set(ns+'foobar', 'foobar')

# add kjw namespace elements (or attributes) elsewhere
... get child element species ...
species.append(etree.Element(ns + 'test'))

# remove temporary namespaced attribute from root node
del root.attrib[ns+'foobar']

Comments

0

You could replace the root element to add 'kjw' to its nsmap. Then xmlns declaration would be only in the root element.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.