1

I have a buffer (an array of chars) that I am using to read data in from a socket, which contains an HTTP request. I have some regular expressions that work nicely for extracting relevant info from strings, and I am looking for a way to use those regular expressions to extract the same info from an array instead, without having to build a string out of the array. Is this possible with ctypes? This is an example of how I am getting the data right now.

import socket, array, ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')
buff = array.array('c', '\0'*4096)
a, b = socket.socketpair()
fd = a.fileno()
buff_pointer = buff.buffer_info()[0]
b.send('a'*100)
bytes_read = libc.recv(fd, buff_pointer, len(buff), 0)
print buff #prints a zeroed array of length 4096 with 100 chars of 'a' in front

This is purely for fun/for lulz btw, inb4 it's unpythonic.

7
  • Dunno if it's officially supported, but when I try it, re seems to support searching in anything that supports the buffer interface. That includes array.array instances. Commented May 28, 2014 at 4:22
  • Alternatively, buff = bytearray(4096); bytes_read = a.recv_into(buff). Commented May 28, 2014 at 4:27
  • @eryksun yeah, I am aware of that method, I am just using ctypes for kicks. Commented May 28, 2014 at 4:39
  • 1
    OK, then I suggest using a ctypes array such as buff = (ctypes.c_char * 4096)(). Then you don't have to get buff_pointer, unless you're doing that for fun, too. Commented May 28, 2014 at 4:52
  • 2
    The pattern needs to be hashable because re caches them. Commented May 28, 2014 at 7:02

1 Answer 1

1

Just run your regexs on the array object, e.g.

>>> import re
>>> m = re.match('^aaaaa', buff)
>>> m
<_sre.SRE_Match object at 0x7fd4cd2cd030>
>>> m.group()
array('c', 'aaaaa')
>>> m.string[m.start():m.end()]
array('c', 'aaaaa')

etc...

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, the _sre extension module should work on any contiguous char or wchar_t buffer . See getstring in _sre.c.
Wide characters are 4 bytes on most POSIX builds, so you can use a buffer with either single-byte elements or four-byte elements. Wide characters are 2 bytes on Windows. re factors the character size into its iteration and pattern matching: re.match(b'ab', (ctypes.c_uint32 * 2)(97, 98)).group() == [97, 98].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.