2

I'm trying to find the maximum value in a string of hex numbers. My approach is to convert is to tokenize the string, convert that list of tokens into ints and then take the max value.

The string is formatted like so:

'\x1e\x00\x00\x00\xf0\x0f184203308373388492761797873987'

I cannot control the format because it is the output of the Python binding of the LZ4 algorithm.

Other similar answers on SO don't have mixed types or use escape characters in a string with many hex numbers.

So, how do I turn that into a list such as:

[0x1e, 0x00, 0x00, ...]

Thank you for your help.

3
  • 1
    Are you referring to the output of lz4.compress()? If so, the string isn't a list of hex numbers. lz4.compress("z"*10) == '\n\x00\x00\x00\xa0zzzzzzzzzz', for example. Commented Jun 7, 2012 at 16:03
  • @DSM I was referring to the output of lz4.dumps(). But, the output looks similar, except the ending is integers in my example. What are the parts of your string after '\x'? Commented Jun 7, 2012 at 16:24
  • 2
    The same thing holds: consider lz4.dumps("z"*10) == '\n\x00\x00\x00\xa0zzzzzzzzzz'. I agree that the two digits after the "x" are hexadecimal, but which characters get them and which don't is a fluke of encoding. Look at lz4.dumps("\xff"*10), for example. Are you simply after max(ord(x) for x in s)? Commented Jun 7, 2012 at 16:29

1 Answer 1

1

I'm not sure how do you want to get the take the integers after the hex value ... Are they supposed to be 1, 2, or x digits ?

So, I do this:

import re

# convert unicode or string to raw
def raw(s):
    if isinstance(s, str):
        s = s.encode('string-escape')
    elif isinstance(s, unicode):
        s = s.encode('unicode-escape')
    return s

s = '\x1e\x00\x00\x00\xf0\x0f184203308373388492761797873987'

print [ re.sub(r'\\', r'0', raw(i)) for i in s]

And I get this:

['0x1e', '0x00', '0x00', '0x00', '0xf0', '0x0f', '1', '8', '4', '2', '0', '3', '3', '0', '8', '3', '7', '3', '3', '8', '8', '4', '9', '2', '7', '6', '1', '7', '9', '7', '8', '7', '3', '9', '8', '7']

Hope that could help

edit: simplified the list comprehension

edit: if you indeed want to get rid of non hex values, then you could use

>>> print [int(re.sub(r'\\', r'0', raw(i)), 16) for i in s if len(raw(i))>1]
[30, 0, 0, 0, 240, 15]

and comparing ... or even way better, as DSM stated

>>> s = '\x1e\x00\x00\x00\xf0\x0f184203308373388492761797873987'
>>> ord(max(s))
240
Sign up to request clarification or add additional context in comments.

1 Comment

Geez. I was complicating something simple. My thanks to both you and @DSM

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.