I'm dealing with a communication channel that:
- Converts
\ncharacters into\r\n - Removes
0xfd,0xfe, and0xffcharacters entirely
Not ideal for transferring binary data. I can effectively use base64 to transfer binary data through it, but this makes the data being transferred 33% larger, which, when dealing with large amounts of binary data like I am, kinda sucks.
A simple, not entirely efficient way to use Python to create my own encoding is this:
escape_char = b'='
# These are the bytes we don't want in our stream
bytes_to_escape = b'\n\xfd\xfe\xff'
# This is a mapping of the bytes being replaced, with the bytes we're
# replacing them with
replacements = {
escape_char: escape_char,
**{
byte: i.to_bytes(1, "big")
for i, byte in enumerate([bytes([b]) for b in bytes_to_escape])
}
}
# Reverse mapping, for decoding
reverse_replacements = {v: k for k, v in replacements.items()}
# Encoder
def encode(data: bytes) -> bytes:
result = b''
for byte in [bytes([i]) for i in data]:
replacement = replacements.get(byte)
if replacement:
result += escape_char + replacement
else:
result += byte
return result
# Decoder
def decode(data: bytes) -> bytes:
result = b''
i = 0
while i < len(data):
current_byte = bytes([data[i]])
next_byte = bytes([data[i+1]]) if i+1 < len(data) else None
if current_byte == escape_char:
if not next_byte:
result += current_byte
else:
replacement = reverse_replacements[next_byte]
result += replacement
i += 1
else:
result += current_byte
i += 1
return result
This will essentially:
- Escape the
=character with==, because it's used as our escape character - Convert
\nto0x00 - Convert
0xfdto0x01 - Convert
0xfeto0x02 - Convert
0xffto0x03
Is there a way to make this more efficient, short of writing it in a low-level language as a python module? Maybe something that uses regex might work faster?