I am using an api that only takes strings. It's intended to store things. I would like to be able to read in a binary file, convert the binary data to a string, and store the string. Then I would like to retrieve the string, convery back to binary, and save the file.
so what I am trying to do is (in python):
PDF -> load into program as string -> store string ->retrieve string ->save as binary PDF file
For example, I have a PDF called PDFfile. I want to read it in:
datafile=open(PDFfile,'rb')
pdfdata=datafile.read()
When I read up on the .read function it says that it's supposed to result in a string. It does not, or if it does, its taking the parts that define it as a binary also. I have two lines of code later that prints it out:
print(pdfdata[:20])
print(str(pdfdata[:20]))
The result is this:
b'%PDF-1.3\n%\xc4\xe5\xf2\xe5\xeb\xa7\xf3\xa0\xd0\xc4'
b'%PDF-1.3\n%\xc4\xe5\xf2\xe5\xeb\xa7\xf3\xa0\xd0\xc4'
Those look like 2 bytes types to me, but apparently, the second one is a string. When I do type(pdfdata) I get bytes.
I am struggling to try to get a clean string that represents the PDF file, that I can then convert back to a bytes format. The API fails if I send it without stringifying it.
str(pdfdata)
I have also tried playing around with encode and decode, but I get errors that encode/decode cant handle 0xc4 which is apparently in the binary file.
The final oddity:
When I store the str(pdfdata) and retrieve it into 'retdata' I print some bytes out of it and compare to the original
print(pdfdata[:20])
print(retdata[:20])
i get really different results
b'%PDF-1.3\n%\xc4\xe5\xf2\xe5\xeb\xa7\xf3\xa0\xd0\xc4'
b'%PDF-1.3\n%\xc4\xe
But the data is there, if I show 50 characters of the retdata
b'%PDF-1.3\n%\xc4\xe5\xf2\xe5\xeb\xa7\xf3\xa0\xd0\
Needless to say, when I retrieve the data, and store as a pdf, its corrupted and doesn't work.
When I save the stringified pdf and the string version of the retrieved data, they are identical. so the storage and retrieval of a string is working fine.
So I think the corruption is happening when I convert to a string.
I know I'm getting loquacious, but you guys like to have all the info.