1

if I have this data

/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);
|           ; var int32_t var_324h @ ebp-0x324
|           ; arg int32_t arg_4h @ ebp+0x4
|           ; arg int32_t arg_8h @ ebp+0x8
|           0x004020b0      55             push ebp
|           0x004020b1      8bec           mov ebp, esp
|           0x004020b3      81ec24030000   sub esp, 0x324
|           0x004020b9      6a17           push 0x17                   ; 23
|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c
|           0x004020c1      85c0           test eax, eax
|       ,=< 0x004020c3      7407           je 0x4020cc
|       |   0x004020c5      b902000000     mov ecx, 2
|       |   0x004020ca      cd29           int 0x29
|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3
|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0
|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0
|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0

and i want the get the opcodes

55
8bec
81ec24030000
6a17
--snip--

till i have the full opcodes

558bec81ec240300006a17--snip--

How i can do it in python using regex ? I tried 0x[0-9a-z]\ *(.*?)\ + but it didn't works

10
  • You should show your python code - preferably a minimal reproducible example. Commented Jan 14, 2022 at 18:26
  • Your character class [0-9a-z] should probably have some number of repetitions. Otherwise it will only match a single hex digit. I recommend testing your regex on regex101.com Commented Jan 14, 2022 at 18:27
  • Maybe 0x[0-9A-Fa-f]{8} *(\S+)? Commented Jan 14, 2022 at 18:27
  • Or 0x[0-9a-f]+[^\S\n]+([a-f0-9]+)[^\S\n] Commented Jan 14, 2022 at 18:27
  • 1
    Thank you @WiktorStribiżew and The fourth bird for this Commented Jan 14, 2022 at 18:35

1 Answer 1

1

You can use

0x[0-9a-fA-F]{8} *(\S+)
0x[0-9a-fA-F]{8}[\t ]*(\S+)
0x[0-9a-fA-F]{8}[^\S\n]*(\S+)

See the regex demo. Details:

  • 0x - a literal text
  • [0-9a-fA-F]{8} - eight hex chars
  • * - zero or more spaces
  • [\t ]* - zero or more spaces/tabs
  • [^\S\n]* - zero or more whitespaces that are not LF (line feed, "\n") chars
  • (\S+) - Group 1: one or more non-whitespace chars

See the Python demo:

import re
text = "/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);\n|           ; var int32_t var_324h @ ebp-0x324\n|           ; arg int32_t arg_4h @ ebp+0x4\n|           ; arg int32_t arg_8h @ ebp+0x8\n|           0x004020b0      55             push ebp\n|           0x004020b1      8bec           mov ebp, esp\n|           0x004020b3      81ec24030000   sub esp, 0x324\n|           0x004020b9      6a17           push 0x17                   ; 23\n|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c\n|           0x004020c1      85c0           test eax, eax\n|       ,=< 0x004020c3      7407           je 0x4020cc\n|       |   0x004020c5      b902000000     mov ecx, 2\n|       |   0x004020ca      cd29           int 0x29\n|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3\n|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0\n|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0\n|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0"
print(re.findall(r'0x[0-9a-fA-F]{8}[\t ]*(\S+)', text))
# => ['55', '8bec', '81ec24030000', '6a17', 'ff151c304000', '85c0', '7407', 'b902000000', 'cd29', 'a340744000', '890d3c744000', '891538744000']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.