2

I'm using Python 3.2.3 on Windows, and am trying to convert binary data within a C-style ASCII file into its binary equivalent for later parsing using the struct module. For example, my input file contains "0x000A 0x000B 0x000C 0x000D", and I'd like to convert it into "\x00\x0a\x00\x0b\x00\x0c\x00\x0d".

The problem I'm running into is that the string datatypes have changed in Python 3, and the built-in functions to convert from hexadecimal to binary, such as binascii.unhexlify(), no longer accept regular unicode strings, but only byte strings. This process of converting from unicode strings to byte strings and back is confusing me, so I'm wondering if there's an easier way to achieve this. Below is what I have so far:

with open(path, "r") as f:
    l = []
    data = f.read()
    values = data.split(" ")

    for v in values:
            if (v.startswith("0x")):
                    l.append(binascii.unhexlify(bytes(v[2:], "utf-8").decode("utf-8")

    string = ''.join(l)
2
  • Did you try opening the file as binary? 'rb' Commented Oct 7, 2012 at 3:52
  • No, I haven't tried to open the file as binary. My line of thought was that the input file uses quasi-C syntax, so then not only would I need to filter out comments and separators between hexadecimal numbers, but also perform the hexadecimal to binary conversion at the same time, which could get tricky. This is why I ended up opening it in ASCII mode and splitting it into a list based on the space delimiter, because then I could easily loop through and exclude anything that doesn't start with "0x". Commented Oct 7, 2012 at 4:07

2 Answers 2

1
3>> ''.join(chr(int(x, 16)) for x in "0x000A 0x000B 0x000C 0x000D".split()).encode('utf-16be')
b'\x00\n\x00\x0b\x00\x0c\x00\r'
Sign up to request clarification or add additional context in comments.

Comments

1

As agf says, opening the image with mode 'r' will give you string data. Since the only thing you are doing here is looking at binary data, you probably want to open with 'rb' mode and make your result of type bytes, not str.

Something like:

with open(path, "rb") as f:
    l = []
    data = f.read()
    values = data.split(b" ")

    for v in values:
            if (v.startswith(b"0x")):
                    l.append(binascii.unhexlify(v[2:]))

    result = b''.join(l)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.