python regular expression, extract bytes from listing output

Question

I'm trying to extract the binary opcodes from listing file generated via /Fa flag in visual studio. The format look like:

00040   8b 45 bc     mov     eax, DWORD PTR _i$2535[ebp]
  00043 3b 45 c8     cmp     eax, DWORD PTR _code_section_size$[ebp]
  00046 73 19        jae     SHORT $LN1@unpacker_m

When the first number is address, then we have opcodes and then the instruction mnemonic, in such case I'd like to get an array of:

8b 45 bc 3b 45 c8 73 19

First I split the line and then run the following regular expression to get bytes:

HEX_BYTE = re.compile("\s*[\da-fA-F]{2}\s*", re.IGNORECASE)

But this regex match everything, someone have an idea how to do this in a simple way? Thanks David

You may read it line by line and use ^\d{5}\s+([\da-fA-F]{2}(?:\s+[\da-fA-F]{2})*) to extract the opcodes into group(1) and then split with space and append the results to the list. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 2, 2016 at 9:22
@WiktorStribiżew: There appear to be some whitespaces at the beginning in the second/third line. — Jan
– Jan, Commented Feb 2, 2016 at 10:09
@Jan: I change the format of the question, and I am not sure if those spaces are really there. OP is keeping silent. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Feb 2, 2016 at 10:11
I don't think that the leading spaces matter - the file uses a fixed width field format anyway. — mhawke
– mhawke, Commented Feb 2, 2016 at 10:13

msw · Accepted Answer · 2016-02-02 09:44:20Z

3

Forget regexp, it is over-complicated for extracting data from fixed fields. The statements

line = '  00043 3b 45 c8     cmp     eax,'
print(line[7:19].split())

yield

['3b', '45', 'c8']

You might need to

line = line.expandtabs()

first if there are Tab characters in the input strings.

answered Feb 2, 2016 at 9:44

msw

43.7k9 gold badges92 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

JonnyTieM · Accepted Answer · 2016-02-02 09:27:40Z

0

You could try this one: \s[\da-fA-F]{2}\s[\da-fA-F]{2}(\s[\da-fA-F]{2})?

It would return three results for your example:

" 8b 45 bc"

" 3b 45 c8"

" 73 19"

You would have to split them with space and then you have the same result as you described.

answered Feb 2, 2016 at 9:27

JonnyTieM

1772 silver badges12 bronze badges

Comments

mhawke · Accepted Answer · 2016-02-02 10:07:14Z

Looking at the file sample in the question it appears to consist of fixed width fields, so you should be able to extract those values using fixed offsets into each line:

with open('listing.txt') as listing:
    opcodes = [opcode for line in listing for opcode in line[8:16].split()]

>>> opcodes
['8b', '45', 'bc', '3b', '45', 'c8', '73', '19']

The above uses a list comprehension to pluck out the required fields which are known to exist in positions 8 through 16 using nothing but a slice operation and a split(). This ought to be a great deal faster than a regular expression, and it is a great deal more readable.

If you want the opcodes as integers:

with open('listing.txt') as listing:
    opcodes = [int(opcode, 16) for line in listing for opcode in line[8:16].split()]

>>> opcodes
[139, 69, 188, 59, 69, 200, 115, 25]

Jan · Accepted Answer · 2016-02-02 10:33:52Z

0

A Python example with the help of regular expressions:

import re
string = """00040   8b 45 bc     mov     eax, DWORD PTR _i$2535[ebp]
  00043 3b 45 c8     cmp     eax, DWORD PTR _code_section_size$[ebp]
  00046 73 19        jae     SHORT $LN1@unpacker_m"""

bytes = map(str.strip, re.findall(r'((?:\b[\da-fA-F]{2}\b\s+)+)', string))
print bytes
# ['8b 45 bc', '3b 45 c8', '73 19']

answered Feb 2, 2016 at 10:33

Jan

43.3k11 gold badges57 silver badges87 bronze badges

Collectives™ on Stack Overflow

python regular expression, extract bytes from listing output

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related