0

I am having problem with decoding byte string that I have to send from one computer to another. File is format PDF. I get error that goes:

fileStrings[i] = fileStrings[i].decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 648: invalid continuation byte

Any ideas of how to remove b' ' marking? I need to compile file back up, but i also need to know its size in bytes before sending it and I figured I will know it by decoding each byte string (Works for txt files but not for pdf ones..)

Code is:

    with open(inputne, "rb") as file:
        while 1:
            readBytes= file.read(dataMaxSize)
            fileStrings.append(readBytes)
            if not readBytes:
                break
            readBytes= ''
    
    filesize=0
    for i in range(0, len(fileStrings)):
        fileStrings[i] = fileStrings[i].decode()
        filesize += len(fileStrings[i])

Edit: For anyone having same issue, parameter len() will give you size without b''.

1
  • 1
    "size in bytes" - decoding would translate bytes to characters, and the number of characters is not the same as the number of bytes. is one symbol but 3 bytes: b'\xe2\x88\x9e', or 8 bytes in UTF32. Commented Nov 30, 2020 at 15:23

1 Answer 1

1

In Python, bytestrings are for raw binary data, and strings are for textual data. decode tries to decode it as utf-8, which is valid for txt files, but not for pdf files, since they can contain random bytes. You should not try to get a string, since bytestrings are designed for this purpose. You can get the length of bytestrings like normal, with len(data). Many of the string operations also apply to bytestrings, such as concatenation and slicing (data1 + data2 and data[1:3]).

As a side note, the b'' when you print it is just because the __str__ method for bytestrings is equivalent to repr. It's not in the data itself.

Sign up to request clarification or add additional context in comments.

1 Comment

Doesn't it count b' ' into size when I use len()? EDIT: No it doesn't count b' ' into len, as side note for someone having same issue as me. Thanks for your answer @Aplet123 , it helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.