byte operations (XOR) in python

Question

#!/usr/bin/env python3

import binascii


var=binascii.a2b_qp("hello")
key=binascii.a2b_qp("supersecretkey")[:len(var)]

print(binascii.b2a_qp(var))
print(binascii.b2a_qp(key))


# here I want to do an XOR operation on the bytes in var and key and place them in 'encryption': encryption=var XOR key

print(binascii.b2a_qp(encrypted))

If someone could enlighten me on how I could accomplish this I would be very happy. Very new to the whole data-type conversions so yeah... reading through the python wiki is not as clear as I would like.

do you mean xoring the var string against the key string? Mind you they They have different lengths. In python the xor operator is ^ — Pynchia
– Pynchia, Commented Apr 2, 2015 at 8:21
So my use of [:len(var)] to cut the key to the same size as the the var string will not work? I thought each character is converted in to a single byte where a=97=01100001 for example. When I use encrypted = var ^ key I get "TypeError: unsupported operand type(s) for ^: 'bytes' and 'bytes'" — jden
– jden, Commented Apr 2, 2015 at 8:26

Georg Plaz · Accepted Answer · 2022-07-21 13:10:55Z

57

Comparison of two python3 solutions

The first one is based on zip:

def encrypt1(var, key):
    return bytes(a ^ b for a, b in zip(var, key))

The second one uses int.from_bytes and int.to_bytes:

def encrypt2(var, key, byteorder=sys.byteorder):
    key, var = key[:len(var)], var[:len(key)]
    int_var = int.from_bytes(var, byteorder)
    int_key = int.from_bytes(key, byteorder)
    int_enc = int_var ^ int_key
    return int_enc.to_bytes(len(var), byteorder)

Simple tests:

assert encrypt1(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'
assert encrypt2(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'

Performance tests with var and key being 1000 bytes long:

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt1(a, b)"
10000 loops, best of 3: 100 usec per loop

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt2(a, b)"
100000 loops, best of 3: 5.1 usec per loop

The integer approach seems to be significantly faster.

edited Jul 21, 2022 at 13:10

Georg Plaz

6,0185 gold badges43 silver badges66 bronze badges

answered Apr 2, 2015 at 9:19

Vincent

13.5k1 gold badge51 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bora M. Alper Over a year ago

One might simply use int.from_bytes(bytes_object, endianness) to convert a bytes object to an integer directly (and in a saner way).

Vincent Over a year ago

@Czechnology The integer approach seems to be significantly faster. See my edit.

Moiz Over a year ago

Both encrypt and encrypt2 function fails to fully encrypt the 'var' if length of 'key' is less than 'var'. For example, the following function calls encrypt2(b'hello world', b'ab' ) will result in only first two characters to be encrypted: b'\t\x07llo world'

Moiz Over a year ago

Adding if len(key) < len(var): key = key * int(len(var)/len(key) + 1) before key = key[:len(var)] will fix the issue

DNA · Accepted Answer · 2017-11-02 22:06:40Z

27

It looks like what you need to do is XOR each of the characters in the message with the corresponding character in the key. However, to do that you need a bit of interconversion using ord and chr, because you can only xor numbers, not strings:

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, key) ] 
>>> encrypted
['\x1b', '\x10', '\x1c', '\t', '\x1d']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, key) ]
>>> decrypted
['h', 'e', 'l', 'l', 'o']

>>> "".join(decrypted)
'hello'

Note that binascii.a2b_qp("hello") just converts a string to another string (though possibly with different encoding).

Your approach, and my code above, will only work if the key is at least as long as the message. However, you can easily repeat the key if required using itertools.cycle:

>>> from itertools import cycle
>>> var="hello"
>>> key="xy"

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, cycle(key)) ]
>>> encrypted
['\x10', '\x1c', '\x14', '\x15', '\x17']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, cycle(key)) ]
>>> "".join(decrypted)
'hello'

To address the issue of unicode/multi-byte characters (raised in the comments below), one can convert the string (and key) to bytes, zip these together, then perform the XOR, something like:

>>> var=u"hello\u2764"
>>> var
'hello❤'

>>> encrypted = [ a ^ b for (a,b) in zip(bytes(var, 'utf-8'),cycle(bytes(key, 'utf-8'))) ]
>>> encrypted
[27, 16, 28, 9, 29, 145, 248, 199]

>>> decrypted = [ a ^ b for (a,b) in zip(bytes(encrypted), cycle(bytes(key, 'utf-8'))) ]
>>> decrypted
[104, 101, 108, 108, 111, 226, 157, 164]

>>> bytes(decrypted)
b'hello\xe2\x9d\xa4'

>>> bytes(decrypted).decode()
'hello❤'

edited Nov 2, 2017 at 22:06

answered Apr 2, 2015 at 8:28

DNA

42.7k12 gold badges114 silver badges153 bronze badges

5 Comments

Hamy Over a year ago

@DNA - nice! Fails for unicode input though...zip places characters into the tuples, then chr gets confused because the unicode character is out of it's range. E.g. var=u'\u2764' would cause an exception....❤

DNA Over a year ago

@Hamy You may be able to use unichr() instead of chr() to fix this, but I haven't tried it yet...

Hamy Over a year ago

@DNA - Good thought, I think it would XOR the wrong data - the two-byte unicode character passed to ord would be xor'ed with a one-byte ascii character with the low bits being combined, when the goal is to treat both var and key as a byte stream and xor them one-bit at a time. E.g. bin(ord(u'\u1000')) is 0b1000000000000 so if I OR it with a byte of all 1s as a stream operation then the high bits should be one, but in reality this happens - bin(ord('\xFF') | ord(u'\u1000')) is 0b1000011111111

Hamy Over a year ago

IMO this just underlines how tricky p2 can be for byte operations...the only quick fix I see for this is to double-check that the input is a str not a unicode e.g. if not isinstance(var, str) or not isinstance(key, str)

DNA Over a year ago

Note that the OP is using Python 3

Latze · Accepted Answer · 2022-03-01 00:36:01Z

3

You can use Numpy to perform faster

import numpy as np
def encrypt(var, key):
    a = np.frombuffer(var, dtype = np.uint8)
    b = np.frombuffer(key, dtype = np.uint8)
    return (a^b).tobytes()

answered Mar 1, 2022 at 0:36

Latze

311 bronze badge

Collectives™ on Stack Overflow

byte operations (XOR) in python

3 Answers 3

Comparison of two python3 solutions

4 Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comparison of two python3 solutions

4 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related