28
#!/usr/bin/env python3

import binascii


var=binascii.a2b_qp("hello")
key=binascii.a2b_qp("supersecretkey")[:len(var)]

print(binascii.b2a_qp(var))
print(binascii.b2a_qp(key))


# here I want to do an XOR operation on the bytes in var and key and place them in 'encryption': encryption=var XOR key

print(binascii.b2a_qp(encrypted))

If someone could enlighten me on how I could accomplish this I would be very happy. Very new to the whole data-type conversions so yeah... reading through the python wiki is not as clear as I would like.

2
  • do you mean xoring the var string against the key string? Mind you they They have different lengths. In python the xor operator is ^ Commented Apr 2, 2015 at 8:21
  • So my use of [:len(var)] to cut the key to the same size as the the var string will not work? I thought each character is converted in to a single byte where a=97=01100001 for example. When I use encrypted = var ^ key I get "TypeError: unsupported operand type(s) for ^: 'bytes' and 'bytes'" Commented Apr 2, 2015 at 8:26

3 Answers 3

57

Comparison of two python3 solutions

The first one is based on zip:

def encrypt1(var, key):
    return bytes(a ^ b for a, b in zip(var, key))

The second one uses int.from_bytes and int.to_bytes:

def encrypt2(var, key, byteorder=sys.byteorder):
    key, var = key[:len(var)], var[:len(key)]
    int_var = int.from_bytes(var, byteorder)
    int_key = int.from_bytes(key, byteorder)
    int_enc = int_var ^ int_key
    return int_enc.to_bytes(len(var), byteorder)

Simple tests:

assert encrypt1(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'
assert encrypt2(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'

Performance tests with var and key being 1000 bytes long:

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt1(a, b)"
10000 loops, best of 3: 100 usec per loop

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt2(a, b)"
100000 loops, best of 3: 5.1 usec per loop

The integer approach seems to be significantly faster.

Sign up to request clarification or add additional context in comments.

4 Comments

One might simply use int.from_bytes(bytes_object, endianness) to convert a bytes object to an integer directly (and in a saner way).
@Czechnology The integer approach seems to be significantly faster. See my edit.
Both encrypt and encrypt2 function fails to fully encrypt the 'var' if length of 'key' is less than 'var'. For example, the following function calls encrypt2(b'hello world', b'ab' ) will result in only first two characters to be encrypted: b'\t\x07llo world'
Adding if len(key) < len(var): key = key * int(len(var)/len(key) + 1) before key = key[:len(var)] will fix the issue
27

It looks like what you need to do is XOR each of the characters in the message with the corresponding character in the key. However, to do that you need a bit of interconversion using ord and chr, because you can only xor numbers, not strings:

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, key) ] 
>>> encrypted
['\x1b', '\x10', '\x1c', '\t', '\x1d']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, key) ]
>>> decrypted
['h', 'e', 'l', 'l', 'o']

>>> "".join(decrypted)
'hello'

Note that binascii.a2b_qp("hello") just converts a string to another string (though possibly with different encoding).

Your approach, and my code above, will only work if the key is at least as long as the message. However, you can easily repeat the key if required using itertools.cycle:

>>> from itertools import cycle
>>> var="hello"
>>> key="xy"

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, cycle(key)) ]
>>> encrypted
['\x10', '\x1c', '\x14', '\x15', '\x17']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, cycle(key)) ]
>>> "".join(decrypted)
'hello'

To address the issue of unicode/multi-byte characters (raised in the comments below), one can convert the string (and key) to bytes, zip these together, then perform the XOR, something like:

>>> var=u"hello\u2764"
>>> var
'hello❤'

>>> encrypted = [ a ^ b for (a,b) in zip(bytes(var, 'utf-8'),cycle(bytes(key, 'utf-8'))) ]
>>> encrypted
[27, 16, 28, 9, 29, 145, 248, 199]

>>> decrypted = [ a ^ b for (a,b) in zip(bytes(encrypted), cycle(bytes(key, 'utf-8'))) ]
>>> decrypted
[104, 101, 108, 108, 111, 226, 157, 164]

>>> bytes(decrypted)
b'hello\xe2\x9d\xa4'

>>> bytes(decrypted).decode()
'hello❤'

5 Comments

@DNA - nice! Fails for unicode input though...zip places characters into the tuples, then chr gets confused because the unicode character is out of it's range. E.g. var=u'\u2764' would cause an exception....❤
@Hamy You may be able to use unichr() instead of chr() to fix this, but I haven't tried it yet...
@DNA - Good thought, I think it would XOR the wrong data - the two-byte unicode character passed to ord would be xor'ed with a one-byte ascii character with the low bits being combined, when the goal is to treat both var and key as a byte stream and xor them one-bit at a time. E.g. bin(ord(u'\u1000')) is 0b1000000000000 so if I OR it with a byte of all 1s as a stream operation then the high bits should be one, but in reality this happens - bin(ord('\xFF') | ord(u'\u1000')) is 0b1000011111111
IMO this just underlines how tricky p2 can be for byte operations...the only quick fix I see for this is to double-check that the input is a str not a unicode e.g. if not isinstance(var, str) or not isinstance(key, str)
Note that the OP is using Python 3
3

You can use Numpy to perform faster

import numpy as np
def encrypt(var, key):
    a = np.frombuffer(var, dtype = np.uint8)
    b = np.frombuffer(key, dtype = np.uint8)
    return (a^b).tobytes()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.