6

I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:

print("Index: ", Index) # The Index


# Subsituting text to our compressed index

for x in range(len(TextTest)):

    TextTest[x]=Index[TextTest[x]]


NewText=''.join(TextTest)

# print(NewText)
# NewText=int(NewText)


byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
    byte_array.append(int(NewText[i:i + 8], 2))


NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')

print(byte_array)

print(byte_array)

print(NewSize)

x=bytes(byte_array)
x.decode()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte

3
  • You can convert it to a string by calling the bytearray.decode() method and supplying an encoding. For example: byte_array.decode('ascii'). If you leave the decoding argument out, it will default to 'utf-8'. Commented Nov 21, 2018 at 7:15
  • Hey, I got this when i added your code: byte_array.decode('ascii') UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 0: ordinal not in range(128). When I removed the 'ascii' part I got:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte Commented Nov 23, 2018 at 10:11
  • That means the data in your byte array doesn't contain valid characters in those encodings. You need to find an acceptable one. There's some here in documentation—'hex' might be good. You can also use 'latin1' which maps the code points 0–255 to the bytes 0x0–0xff. Doing so will allow you to convert the result back to bytes later by using the_string.encode('latin1'). I first heard about doing this in this answer to a unrelated question (to solve a different problem). Commented Nov 23, 2018 at 10:43

1 Answer 1

5

You can use .decode('ascii') (leave empty for utf-8).

>>> print(bytearray("abcd", 'utf-8').decode())
abcd

Source : Convert bytes to a string?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.