0

I'm using Python (2.7/3.8) and working with some complex XML's that are compared together. The order of the XML's can be different, and I'm building a function that acts as a rule for sorting (looking at node attributes, and then node children).

I've taken a look at a few different related questions, but neither are working for my scenario:

I'm able to sort using key=lambda child: child.tag, however I generally want to use the attributes rather than the tag name.

At it's most basic case, I want to be able to sort by attribute name, checking to see if any of ['id', 'label', 'value'] exist as attributes, and using that as the key. Regardless of that, I can't seem to figure out why child.tag works to sort, but child.get('id') does not.

import xml.etree.ElementTree as etree
    
input = '''
    <root>
        <node id="7"></node>
        <node id="10"></node>
        <node id="5"></node>
    </root>
'''

root = etree.fromstring(input)

root[:] = sorted(root, key=lambda child: child.get('id'))

xmlstr = etree.tostring(root, encoding="utf-8", method="xml")
print(xmlstr.decode("utf-8"))

Which returns:

<root>
    <node id="7" />
    <node id="5" />
    <node id="10" />
</root>

Expected:

<root>
    <node id="5" />
    <node id="7" />
    <node id="10" />
</root>

EDIT

As deadshot mentioned, wrapping child.get('id') with int() does fix the issue, however the code has to additionally work for inputs that have both letters + numbers, for example id="node1", "node15", etc.

For example:

<root>
    <node id="node10" />
    <node id="node7" />
    <node id="node5" />
</root>

Expected:

<root>
    <node id="node5" />
    <node id="node7" />
    <node id="node10" />
</root>
4
  • can you post the example with values id="node1", "node15" and expected output Commented Sep 26, 2020 at 6:38
  • @deadshot - Posted. Appreciate the help. It looks like I need to look in to natural sorting, so I'll start on that. Commented Sep 26, 2020 at 6:42
  • @user2288151: Please keep Questions and Answers separate. If you have an alternative or more elaborate solution, post it as a new Answer. Commented Sep 27, 2020 at 5:51
  • @mzjn, separated. Thanks Commented Sep 27, 2020 at 14:10

2 Answers 2

0

You should convert id value to int and You can use regex to extract didgit from id

import re


root[:] = sorted(root, key=lambda child: int(re.search('\d+', child.get('id')).group()))

xmlstr = etree.tostring(root, encoding="utf-8", method="xml")
print(xmlstr.decode("utf-8"))

Output:

<root>
    <node id="node5" />
    <node id="node7" />
    <node id="node10" />
</root>
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Solves that mystery... Now, how would I make this work for both ints/strings, e.g. if id was <node id="node5" />
I've updated the main question with an answer that will work for strings with format test, test123, 123, etc.
0

To further build on deadshot's method, I'm using the below split_key function, I take a string of any time (test, test123, 123) and split it in to the string/int portion as a tuple, to allow for easy sorting by the sorted method.

def split_key(key):
    regex = re.compile(r'^(?P<letters>.*?)(?P<numbers>\d*)$')
    letters = regex.search(key).group('letters') or ''
    numbers = regex.search(key).group('numbers') or 0
    return (letters, int(numbers))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.