0

I've been having trouble loading images from a file as a string. Many of the functions that I need to use in my program rely on the read data being encoded with ascii and it simply fails to handle the data I give it producing the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa8 in position 14: ordinal not in range(128)

So how would I go about converting this data to ascii.

EDIT:

Here is my admittedly messy code I am using. Please do not comment about how messy it is, this is a rough draft:

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

def str2int(string):
    binary = text_to_bits(string)
    number = int(binary, 2)
    return number

def go():
    #filen is the name of the file
    global filen
    #Reading the file
    content = str(open(filen, "r").read())
    #Using A function from above
    integer = str2int(content)
    #Write back to the file
    w = open(filen, "w").write(str(integer))
2
  • 1
    Any image processing library or method will accept binary strings, not Unicode. If you have a specific method that doesn't accept str image data, ask a specific question about that method. Commented Sep 28, 2015 at 8:37
  • ASCII is data ranged int [0-127] so you cannot read a byte with value as 0xA8 (168 > 127) ! You should use an other encoding for reading and especcially bianry data instead of text data. Commented Sep 28, 2015 at 8:38

1 Answer 1

2

Image data is not ASCII. Image data is binary, and thus uses bytes that the ASCII standard doesn't cover. Don't try to decode the data as ASCII. You also want to make sure you open your file in binary mode, to avoid platform-specific line separator translations, something that'll damage your image data.

Any method expecting to handle image data will deal with binary data, and in Python 2 that means you'll be handling that as the str type.

In your specific case, you are using a function that expects to work on Unicode data, not binary image data, and it is trying to encode that data to binary. In other words, because you are you are giving it data that is already binary (encoded), the function applies a conversion method for Unicode (to produce a binary representation) on data that is already binary. Python then tries to decode first to give you Unicode to encode. It is that implicit decoding that fails here:

>>> '\xa8'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa8 in position 0: ordinal not in range(128)

Note that I encoded, but got a decoding exception.

The code you are using is extremely convoluted. If you wanted to interpret the whole binary contents of a file as one large integer, you could do it by converting to a hex representation, but then you'd not convert to a binary string and back to an integer again. The following would suffice:

with open(filename, 'rb') as fileobj:
    binary_contents = fileobj.read()
    integer_value = int(binascii.hexlify(binary_contents), 16)

Image data is not unually interpreted as one long number however. Binary data can encode integers, but when processing images, you'd usually do so using the struct module to decode specific integer values from specific bytes instead.

Sign up to request clarification or add additional context in comments.

7 Comments

I attempted to open it using open() and then using str() on the read data but to no avail. Is there any way I can make the data at least an ascii representation of the files contents then convert back when needed?
@jjcyalater: you tagged your question with python-2.7; did you use from io import open perhaps? If trying to read from an open file object throws that exception then you either are using an io.open() call or are using Python 3. Please post your actual code and use the correct version tags. There should be no need to convert data to Unicode objects.
I didn't import any modules, there's a built in one in python 2.7. I'll upload my code soon to the main post ASAP
@jjcyalater: why are you trying to interpret all of the text in a file as an integer? Your code is hugely convoluted, encoding first (which requires that the text be decoded first, which is why you get your error), then encoding to hex, then decoding to integer, then representing as binary, then decoding back to integer. Get rid of the encode, get rid of the bin() call, get rid of the extra int() step. And why are you trying to interpret a whole image as one integer number in the first place?
I'm just doing it for fun really because I'm looking into doing things with a numerical representation of a file like looking at a new way to compress data etc. But i wont go too far into that... I'll try the things suggested in your post.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.