How to read binary files as hex in Python?

Question

I want to read a file with data, coded in hex format:

01ff0aa121221aff110120...etc

the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)

I tried the following code (and other similar):

filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
  a=f.read(1)
  b=a.encode("hex")
  c.append(b)
f.close()

This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!

This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.

There might be a simple solution?

When you say it stopped, did you get an exception, or what? Also to be clear, this is a binary file that you want to read as a sequence of hex encoded byte values? — John Carter
– John Carter, Commented Jan 8, 2016 at 23:05
If you're reading a binary file it is good practice to use 'rb' as your flags to open. — Turn
– Turn, Commented Jan 8, 2016 at 23:06
I can't come up with any reason this would fail assuming you're rendering the code accurately. Every discrete byte value (and the empty string for that matter) encodes as hex just fine for me (in Py2, the hex codec was removed from str.encode in Py3). Try it by itself for every possible character: for c in map(chr, range(256)): print c.encode('hex'). They all work. My answer optimizes to do most of the work at the C layer (in exchange for slightly higher peak memory usage), but your code as given can't break in any way that makes sense. Please give the exact exception or misbehavior. — ShadowRanger
– ShadowRanger, Commented Jan 8, 2016 at 23:28

ShadowRanger · Accepted Answer · 2018-08-18 00:57:10Z

Simple solution is binascii:

import binascii

# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
    # Slurp the whole file and efficiently convert it to hex all at once
    hexdata = binascii.hexlify(f.read())

This just gets you a str of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:

hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))

which will produce the list of len 2 strs corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip:

hexlist = map(''.join, zip(*[iter(hexdata)]*2))

Update:

For people on Python 3.5 and higher, bytes objects spawned a .hex() method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:

with open('data.geno', 'rb') as f:
    hexdata = f.read().hex()

D-slr8 · Accepted Answer · 2018-08-20 05:04:59Z

3

Just an additional note to these, make sure to add a break into your .read of the file or it will just keep going.

def HexView():
    with open(<yourfilehere>, 'rb') as in_file:
        while True:
            hexdata = in_file.read(16).hex()     # I like to read 16 bytes in then new line it.
            if len(hexdata) == 0:                # breaks loop once no more binary data is read
                break
            print(hexdata.upper())               # I also like it all in caps.

answered Aug 20, 2018 at 5:04

D-slr8

1098 bronze badges

Comments

Dmitry Rubanovich · Accepted Answer · 2023-11-30 14:11:53Z

2

If the file is encoded in hex format, shouldn't each byte be represented by 2 characters? So

c=[]
with open('data.geno','rb') as f:
    b = f.read(2)
    while b:
        c.append(b.decode('hex'))
        b=f.read(2)

or you can even do

with open('data.geno','rb') as f:
    c = list(f.read().decode('hex'))

for example (in python 2.7.18), this works

>>> list(b'404040'.decode('hex'))
['@', '@', '@']

python3

This won't work in Python 3. In Python you would use the codecs module:

import codecs
with open('data.geno','rb') as f:
    c = list(map(chr, codecs.decode(f.read(), 'hex')))

or (depending on whether you are looking for them as number or as characters)

import codecs
with open('data.geno','rb') as f:
    c = list(codecs.decode(f.read(), 'hex'))

because in Python 3,

>>> import codecs
>>> codecs.decode(b'404040', 'hex')
b'@@@'
>>> list(codecs.decode(b'404040', 'hex'))
[64, 64, 64]
>>> list(map(chr, codecs.decode(b'404040', 'hex')))
['@', '@', '@']

or even ''.join(map(chr, codecs.decode(f.read(), 'hex'))) if you want a string instead of a list.

>>> ''.join(map(chr, codecs.decode(b'404040', 'hex')))
'@@@'

edited Nov 30, 2023 at 14:11

answered Jan 8, 2016 at 23:41

Dmitry Rubanovich

2,63721 silver badges30 bronze badges

3 Comments

ShadowRanger Over a year ago

The question's grammar ambiguous, that opening sentence could also mean "I want to read the data and encode it as hex". The rest of the question states they want two character strings, which favors that interpretation. I'll admit it's rather confusing.

SachaDee Over a year ago

I andertsood the question the same way. +1

Shayne Feb 24 at 4:10

Seems that python doesnt know what a hex decoding is. "'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs". Im guessing that was a python 2 thing?

Per Persson · Accepted Answer · 2016-01-09 08:28:02Z

0

Thanks for all interesting answers!

The simple solution that worked immediately, was to change "r" to "rb", so:

f=open('data.geno','r')  # don't work
f=open('data.geno','rb')  # works fine

The code in this case is actually only two binary bites, so one byte contains four data, binary; 00, 01, 10, 11.

Yours!

answered Jan 9, 2016 at 8:28

Per Persson

1851 gold badge1 silver badge7 bronze badges

Collectives™ on Stack Overflow

How to read binary files as hex in Python?

4 Answers 4

Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related