Python XML Sorting by Attribute/Children

Question

I'm using Python (2.7/3.8) and working with some complex XML's that are compared together. The order of the XML's can be different, and I'm building a function that acts as a rule for sorting (looking at node attributes, and then node children).

I've taken a look at a few different related questions, but neither are working for my scenario:

I'm able to sort using key=lambda child: child.tag, however I generally want to use the attributes rather than the tag name.

At it's most basic case, I want to be able to sort by attribute name, checking to see if any of ['id', 'label', 'value'] exist as attributes, and using that as the key. Regardless of that, I can't seem to figure out why child.tag works to sort, but child.get('id') does not.

import xml.etree.ElementTree as etree
    
input = '''
    <root>
        <node id="7"></node>
        <node id="10"></node>
        <node id="5"></node>
    </root>
'''

root = etree.fromstring(input)

root[:] = sorted(root, key=lambda child: child.get('id'))

xmlstr = etree.tostring(root, encoding="utf-8", method="xml")
print(xmlstr.decode("utf-8"))

Which returns:

<root>
    <node id="7" />
    <node id="5" />
    <node id="10" />
</root>

Expected:

<root>
    <node id="5" />
    <node id="7" />
    <node id="10" />
</root>

EDIT

As deadshot mentioned, wrapping child.get('id') with int() does fix the issue, however the code has to additionally work for inputs that have both letters + numbers, for example id="node1", "node15", etc.

For example:

<root>
    <node id="node10" />
    <node id="node7" />
    <node id="node5" />
</root>

Expected:

<root>
    <node id="node5" />
    <node id="node7" />
    <node id="node10" />
</root>

can you post the example with values id="node1", "node15" and expected output — deadshot
– deadshot, Commented Sep 26, 2020 at 6:38
@deadshot - Posted. Appreciate the help. It looks like I need to look in to natural sorting, so I'll start on that. — user2288151
– user2288151, Commented Sep 26, 2020 at 6:42
@user2288151: Please keep Questions and Answers separate. If you have an alternative or more elaborate solution, post it as a new Answer. — mzjn
– mzjn, Commented Sep 27, 2020 at 5:51

deadshot · Accepted Answer · 2020-09-26 06:45:27Z

0

You should convert id value to int and You can use regex to extract didgit from id

import re


root[:] = sorted(root, key=lambda child: int(re.search('\d+', child.get('id')).group()))

xmlstr = etree.tostring(root, encoding="utf-8", method="xml")
print(xmlstr.decode("utf-8"))

Output:

<root>
    <node id="node5" />
    <node id="node7" />
    <node id="node10" />
</root>

edited Sep 26, 2020 at 6:45

answered Sep 26, 2020 at 6:21

deadshot

9,0774 gold badges23 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2288151 Over a year ago

Thank you! Solves that mystery... Now, how would I make this work for both ints/strings, e.g. if id was <node id="node5" />

user2288151 Over a year ago

I've updated the main question with an answer that will work for strings with format test, test123, 123, etc.

user2288151 · Accepted Answer · 2020-09-27 14:07:01Z

0

To further build on deadshot's method, I'm using the below split_key function, I take a string of any time (test, test123, 123) and split it in to the string/int portion as a tuple, to allow for easy sorting by the sorted method.

def split_key(key):
    regex = re.compile(r'^(?P<letters>.*?)(?P<numbers>\d*)$')
    letters = regex.search(key).group('letters') or ''
    numbers = regex.search(key).group('numbers') or 0
    return (letters, int(numbers))

answered Sep 27, 2020 at 14:07

user2288151

431 silver badge7 bronze badges

Collectives™ on Stack Overflow

Python XML Sorting by Attribute/Children

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related