0

So, I have this string 01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000

and I want to decode it using python, I'm getting this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 280: invalid start byte

According to this webiste: https://www.binaryhexconverter.com/binary-to-ascii-text-converter

The output should be S�ellotherehowyoudoingimfineareyoufineP

Here's my code:

def decodeAscii(bin_string):
    binary_int = int(bin_string, 2);
  
    byte_number = binary_int.bit_length() + 7 // 8
    binary_array = binary_int.to_bytes(byte_number, "big")
    ascii_text = binary_array.decode()
    
    print(ascii_text)

How do I fix it?

2
  • The in the output seems to indicate a weird character outside the normal ascii range. Why is that there? Commented Sep 23, 2021 at 7:47
  • @khelwood I'm trying to transfer data over the audio and this binary string was received by the receiver, where there is no error-correcting techniques were implemented. And that's the reason for it. Commented Sep 23, 2021 at 7:53

3 Answers 3

2

Your bytes simply cannot be decoded as utf-8, just as the error message tells you.

utf-8 is the default encoding parameter of decode - and the best way to put in the correct encoding value is to know the encoding - otherwise you'll have to guess.

And guessing is probably what the website does, too, by trying the most common encodings, until one does not throw an exception:

def decodeAscii(bin_string):
    binary_int = int(bin_string, 2);
    byte_number = binary_int.bit_length() + 7 // 8
    binary_array = binary_int.to_bytes(byte_number, "big")
    ascii_text = "Bin string cannot be decoded"
    for enc in ['utf-8', 'ascii', 'ansi']:
        try:
            ascii_text = binary_array.decode(encoding=enc)
            break
        except:
            pass
    print(ascii_text)

s = "01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000"
decodeAscii(s)

Output:

S°ellotherehowyoudoingimfineareyoufineP

But there's no guarantee that you find the "correct" encoding by guessing.

Sign up to request clarification or add additional context in comments.

Comments

1

Your binary string is just not a valid ascii or utf-8 string. You can tell decode to ignore invalid sequences by saying

ascii_text = binary_array.decode(errors='ignore')

1 Comment

Just keep in mind that this will more likely lead to a loss of actual data, compared to the try..except approach. E.g. print("Hello Wörld".encode("ansi").decode(errors='ignore')) will throw away the ö, while the try...except approach might eventually guess the correct encoding.
1

It could be solved in one line:

Try this:

def bin_to_text(bin_str):
    bin_to_str = "".join([chr(int(bin_str[i:i+8],2)) for i in range(0,len(bin_str),8)])

    return bin_to_str

bin_str = '01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000'
bin_to_str = bin_to_text(bin_str)
print(bin_to_str)

Output:

S°ellotherehowyoudoingimfineareyoufineP

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.