3

I decided to make my own Base64 encoder and decoder, despite there already being a module for this in the standard library. It's just meant to be a fun project. However, the encoder, for some reason, incorrectly encodes some characters, and I haven't had luck with debugging. I've tried to follow the model found on Wikipedia to a tee. I believe the problem has to do with the underlying conversion to binary format, but I'm not sure.

Code:

def encode_base64(data):
    raw_bits = ''.join('0' + bin(i)[2:] for i in data)
    # First bit is usually (always??) 0 in ascii characters
    
    split_by_six = [raw_bits[i: i + 6] for i in range(0, len(raw_bits), 6)]
    
    if len(split_by_six[-1]) < 6: # Add extra zeroes if necessary
        split_by_six[-1] = split_by_six[-1] + ((6 - len(split_by_six[-1])) * '0')
    
    padding = 2 if len(split_by_six) % 2 == 0 else 1
    if len(split_by_six) % 4 == 0: # See if padding is necessary
        padding = 0
    
    indexer = ([chr(i) for i in range(65, 91)] # Base64 Table
         + [chr(i) for i in range(97, 123)]
         + [chr(i) for i in range(48, 58)]
         + ['+', '/'])
    
    return ''.join(indexer[int(i, base=2)] for i in split_by_six) + ('=' * padding)

When I run the following sample code, I get the incorrect value, and you can see below:

print(base_64(b'any carnal pleasure'))
# OUTPUT: YW55QMbC5NzC2IHBsZWFzdXJl=
# What I should be outputting: YW55IGNhcm5hbCBwbGVhc3VyZS4=

For some odd reason, the first few characters are correct, and then the rest aren't. I am happy to answer any questions!

1 Answer 1

2

Python's bin() function doesn't include leading zeroes, so the length of a binary representation will vary:

>>> bin(1)
'0b1'
>>> bin(255)
'0b11111111'
>>> bin(ord("a"))
'0b1100001'
>>> bin(ord(" "))
'0b100000'

In your input, a, n, and y all have one leading zero in their binary representation, so the length of bin(i) is consistent. But the binary representation of ' ' has two leading zeroes, so bin(i) is one bit shorter than you expect, and the rest of raw_bits gets misaligned.

To fix this, make sure you pad the binary representation with leading zeroes until it's 8 characters. I don't think there's a particularly elegant way to do this, but you can use format(ord(i), "#010b")[2:] to make sure the full representation is 10 characters, then discard the 0b, leaving the 8 that you care about.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.