3

How to make a regexp matching for a row of bytes?
For example how to check with regexp that binary data consists of (0-10 byte) characters?

data = 0x00 0x05 0x02 0x00 0x03 ... (not a string, binary data)

3
  • I know, I was just thinking whether can it be done with regexps or not. Didn't find such info in python docs. Commented Jul 14, 2011 at 10:31
  • 1
    @mac: Python strings are essentially byte arrays, and they can contain binary data, which regexes can match against just fine. Commented Jul 14, 2011 at 10:34
  • Could you post the Python repr of some data (e.g. print repr(data[:50])) that you have? It's not clear if you have binary data or a hex representation of such in a string. Commented Jul 14, 2011 at 10:40

3 Answers 3

5

If you want to check that the string contains only characters between chr(0) and chr(10), simply use

re.match('^[\0-\x0A]*$',data)

For Python3, you can do the same with byte strings:

re.match(b'^[\0-\x0A]*$',b'\x01\x02\x03\x04')
Sign up to request clarification or add additional context in comments.

Comments

2

This will match any code before space:

if re.search('[\0-\037]', line):
    # Contains binary data...

I'm not sure what you mean by "0-10 byte", but if you mean that you want to match only the byte values 0 to 10, then replace \037 with \012 in the above code.

Note that 0-10 aren't really the only codes that would suggest binary data; anything below \040 or above \0177 usually suggests binary data.

Comments

0

If you want check if all characters in the given string are in the range 0x00 to 0x0B (not inclusive), regex is way overkill. Try something like this:

>>> check_range = lambda x: ord(x) in set(range(0x00, 0x0B))
>>> s = '\x1\x2\x3\xA'
>>> s2 = 'abcde'

>>> print all(check_range(c) for c in s)
True
>>> print all(check_range(c) for c in s2)
False
>>>

2 Comments

I tested it with timeit module, short string s (5 symbols), million executions. Regexp wins (3.5x faster)
My own tests bear this out. I would delete this answer, but your comment is useful. With some optimization, I managed to bring it down to about 1.3x the speed of the regex version, but not below it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.