I'm trying to do a comparison of some byte values - source A comes from a file that is being 'read':
f = open(fname, "rb")
f_data = f.read()
f.close()
These files can be anything from a few Kb to a few Mb large
Source B is a dictionary of known patterns:
eof_markers = {
'jpg':b'\xff\xd9',
'pdf':b'\x25\x25\x45\x4f\x46',
}
(This list will be extended once the basic process works)
Essentially I'm trying to 'read' the file (source A) and then incrementally inspect the last byte for matches to the pattern list testString = f_data[-counter:] If no match is found, it should increase counter by 1, and try to pattern match against the list again.
I've tried a number of different ways to get this working, I can get the testString to increment correctly, but I keep running into encode issue where various approaches are want to ASCIIify the byte to undertake the comparison.
I'm a bit lost, and not for the first time wandering around the code changing int to u to b and not getting past issues like d9 being a reserved value, and therefore not being able to use the ASCII type comparison tools e.g. if format_type in testString: (results in a UnicodeDecodeError: 'ascii' codec can't decode byte a9
I tried to convert everything to an integer, but that was throwing this error: ValueError: invalid literal for int() with base 2: '.' or ValueError: invalid literal for int() with base 10: '.' I tried to convert the testString to hex bytes, but kept getting TypeError: hex() argument can't be converted to hex (this is more my lack of understanding than anything else I'm sure!....)
There are a number of resources I've found that talk about encoding / hex comparisons e.g. stackoverflow.com/questions/10561923/unicodedecodeerror-ascii-codec-cant-decode-byte-0xef-in-position-1), I've just not found something that I can either fully understand, or that points me down the right path.
Its been a while I've been stuck on this, so any pointers are gratefully received.
format_type, etc., are all byte strings? As soon as you try to mix bytes and Unicode, you'll get an immediate error if you're lucky, or an undiagnosable problem one step later if you're not.UnicodeDecodeErrorwhen you don't think it should be doing any decoding?hexfunction to help. What makes you think it's relevant here?strI'll look at complete example, and finally, 3rd, indeed.. I'm at the "try anything" stage.... but thank you for the explanation as to why it will fail.