Using python, sort XML alphabetically except one element

Question

I'm trying to sort my XML alphabetically while ensuring that a specific element stays at the top. I have managed to sort it alphabetically, but I cannot get that element to stay. Here is what I have so far:

from lxml import etree

data = """
<Example xmlns="http://www.example.org">
    <E>
        <A>A</A>
        <B>B</B>
        <C>C</C>
    </E>
    <B>B</B>
    <D>D</D>
    <A>A</A>
    <C>C</C>
    <F>F</F>
</Example>
"""
doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

The result from this is:

<Example xmlns="http://www.example.org">
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <E>
    <A>A</A>
    <B>B</B>
    <C>1</C>
  </E>
  <F>F</F>
</Example>

Is there anyway I can stop the <E></E> part and its contents from moving?

What is it about <E> that makes it an element which should not be sorted? Is it because it has child nodes? — James
– James, Commented Sep 6, 2017 at 16:40
@James Nope, the child nodes do not matter. I want to make the XML conform to a given schema, which requires that <E> stays at the top, but I wish to sort the rest alphabetically. — Uwot12
– Uwot12, Commented Sep 6, 2017 at 16:51

James · Accepted Answer · 2017-09-06 17:26:03Z

2

You can handle this in at least 2 ways. You could sort everything, and then force <E> to the top through a custom sorting function. Also, you could split the elements to-be-sorted out, sort them, and append them to the end of the non-sorted elements.

Custom sort:

Sorting for text occurs using progressive code points. You can get the code point for a single character using ord(). The lowest printed character is the tab. So for sorting we can tell python to sort all of the elements normally, unless the tag is <E>, then use a tab for sorting which will get sorted first.

There is some extra code to handle the namespace.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,key=lambda x: x.tag if x.tag!='{'+ns[None]+'}E' else '\t')

print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>

Split, apply, combine

Here we split the parent into two lists, sort the second list, and then merge them.

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))
ns = doc.nsmap
for parent in doc.xpath('//*[./*]'):
    to_sort = (e for e in parent if e.tag!='{'+ns[None]+'}E')
    non_sort = (e for e in parent if e.tag=='{'+ns[None]+'}E')
    parent[:] = list(non_sort) + sorted(to_sort, key=lambda e: e.tag)
print(etree.tostring(doc,pretty_print=True).decode('ascii'))

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>

answered Sep 6, 2017 at 17:26

James

37k4 gold badges54 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Uwot12 Over a year ago

Fantastic. Thanks for both of the methods! I like the second one. When I try the second method, it also sorts the child nodes inside the non_sort list. Should it sort that list? I thought it wouldn't as that was not included in the sorted() function. I forgot to include it in the question, but I'm actually not looking to sort the child nodes inside <E>, so that'd be ideal.

PRMoureu · Accepted Answer · 2017-09-06 17:56:46Z

2

It could work with the following way, but it seems the simple tag cannot be reached, so it uses the long tag, including the xmlns part :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        parent[:] = sorted(parent,
                           key=lambda x: (not x.tag =='{http://www.example.org}E', x.tag))

    print(etree.tounicode(doc,pretty_print=True))

This code will output :

<Example xmlns="http://www.example.org">
  <E>
    <A>A</A>
    <B>B</B>
    <C>C</C>
  </E>
  <A>A</A>
  <B>B</B>
  <C>C</C>
  <D>D</D>
  <F>F</F>
</Example>
   </Example>\n'

The following code just outputs these long tags to understand what they look like :

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

    for parent in doc.xpath('//*[./*]'):
        for item in parent:
            print(item.tag)

    {http://www.example.org}E
    {http://www.example.org}B
    {http://www.example.org}D
    {http://www.example.org}A
    {http://www.example.org}C
    {http://www.example.org}F
    {http://www.example.org}A
    {http://www.example.org}B
    {http://www.example.org}C

Another way is to use an helper function to parse the tag to make it more readable :

def normalize(name):
    if name[0] == "{":
        uri, tag = name[1:].split("}")
        return tag
    else:
        return name

doc = etree.XML(data, etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'):
    parent[:] = sorted(parent,
                       key=lambda x: (not normalize(x.tag) == 'E', x.tag))

edited Sep 6, 2017 at 17:56

answered Sep 6, 2017 at 17:25

PRMoureu

13.4k6 gold badges46 silver badges52 bronze badges

1 Comment

Uwot12 Over a year ago

Fantastic, thank you! Is there anyway to custom sort the ordering inside <E>? Forgot to include the fact that the ordering of the child nodes inside that must be specific, rather than alphabetical.

Collectives™ on Stack Overflow

Using python, sort XML alphabetically except one element

2 Answers 2

Custom sort:

Split, apply, combine

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Custom sort:

Split, apply, combine

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related