count occurrences in binary file Python 2X

Question

I want to count the occurrences of a particular header section in a binary file with Python 2.7.3. I have found plenty of examples to count occurrences in .txt type files and to do with lines but little info on counting byte sequences in binaries.

Thoughts are you would use the ASCII characters in the binary to use a string to search for.

The header section in hex is "28 00 28 00 28 00" or "( ( ( " in ascii.

I thought the code would be something like this:

total = 0
for line in f:
    if "( ( ( " in line:
        total += 1
f.close()
print "%s" % total

But it doesn't even seem to count once, it'll print line and that is 120 chars long.

Martijn Pieters · Accepted Answer · 2015-04-11 13:35:50Z

1

You have NULL bytes, not spaces. By using '( ( ( ' are looking for 28 20 28 20 28 20, not 28 00 28 00 28 00.

Use \x00 to create such bytes:

if "(\x00(\x00(\x00" in line:

Looping over a binary file in lines may not make sense; this would only work if there were actually \n bytes in that file.

You could search through the file in chunks rather than lines:

previous = ''
total = 0
for chunk in iter(lambda: f.read(2048), ''):
    if "(\x00(\x00(\x00" in previous + chunk:
        total += 1
    previous = chunk[-5:]  # ensure we don't miss matches at boundaries

edited Apr 11, 2015 at 13:35

answered Apr 11, 2015 at 13:22

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Python_newbie Over a year ago

Thanks for that, rookie mistake, with the updated IF statement the total count is still 0. Would bytes be better than to use than "lines" in the FOR statement?

Martijn Pieters Over a year ago

@Python_newbie: so are you 100% certain those byte sequences are there? For binary files, I'd read in chunks (and take the last 5 bytes from the preceding chunk along for the next test, to ensure you didn't miss a partial match).

Python_newbie Over a year ago

yes they sure are, I can find every header instance in the Hex Editor "Find selection" search criteria. There's at a guess 1000 x 3 different types of headers so that's why I am wanting a script to count and print a confirmed total. Reading in chunks won't work as the metadata can vary in length that's why searching for the header byte sequence is the best option afaik.

Martijn Pieters Over a year ago

@Python_newbie: and the header doesn't contain any length information then?

Python_newbie Over a year ago

I got it to work in the end, basically I used the .count attribute and opened the file in 'rb' mode and then read the entire file in assigned "data" and then went "data.count("(\x00(\x00(\x00") and it returned 1363

|

Collectives™ on Stack Overflow

count occurrences in binary file Python 2X

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related