6

I have a program where I need to compile several thousand large regexes, all of which will be used many times. Problem is, it takes too long (according to cProfiler, 113 secs) to re.compile() them. (BTW, actually searching using all of these regexes < 1.3 secs once compiled.)

If I don't precompile, it just postpones the problem to when I actually search, since re.search(expr, text) implicitly compiles expr. Actually, it's worse, because re is going to recompile the entire list of regexes every time I use them.

I tried using multiprocessing, but that actually slows things down. Here's a small test to demonstrate:

## rgxparallel.py ##
import re
import multiprocessing as mp

def serial_compile(strings):
    return [re.compile(s) for s in strings]

def parallel_compile(strings):
    print("Using {} processors.".format(mp.cpu_count()))
    pool = mp.Pool()
    result = pool.map(re.compile, strings)
    pool.close()
    return result

l = map(str, xrange(100000))

And my test script:

#!/bin/sh
python -m timeit -n 1 -s "import rgxparallel as r" "r.serial_compile(r.l)"
python -m timeit -n 1 -s "import rgxparallel as r" "r.parallel_compile(r.l)"
# Output:
#   1 loops, best of 3: 6.49 sec per loop
#   Using 4 processors.
#   Using 4 processors.
#   Using 4 processors.
#   1 loops, best of 3: 9.81 sec per loop

I'm guessing that the parallel version is:

  1. In parallel, compiling and pickling the regexes, ~2 secs
  2. In serial, un-pickling, and therefore recompiling them all, ~6.5 secs

Together with the overhead for starting and stopping the processes, multiprocessing on 4 processors is more than 25% slower than serial.

I also tried divvying up the list of regexes into 4 sub-lists, and pool.map-ing the sublists, rather than the individual expressions. This gave a small performance boost, but I still couldn't get better than ~25% slower than serial.

Is there any way to compile faster than serial?

EDIT: Corrected the running time of the regex compilation.

I also tried using threading, but due to GIL, only one processor was used. It was slightly better than multiprocessing (130 secs vs. 136 secs), but still slower than serial (113 secs).

EDIT 2: I realized that some regexes were likely to be duplicated, so I added a dict for caching them. This shaved off ~30 sec. I'm still interested in parallelizing, though. The target machine has 8 processors, which would reduce compilation time to ~15 secs.

8
  • How come you have so many large regexes and only do so little searching with them? Can you simplify them, perhaps replace them with plain old string manipulation, or avoid running some of them at all? Commented Aug 15, 2013 at 23:06
  • 1
    The time for searching is for a single use of the entire list. It is very important that the time for a single list search is small, because the user (and my employer) will be expecting near-instant response. I tried simplifying as much as I could, and this is the best I could get without cutting out major features. (The actual list of search terms is ~200,000 items; I have code that switches to simple string functions whenever possible, but that still leaves ~5,000 regexes.) Commented Aug 15, 2013 at 23:15
  • Have you tried using threads instead? 1 thread per cpu and the regex's divvied up among them? regex is implemented in C so you should get a decent level of parallelism despite the GIL. Commented Aug 15, 2013 at 23:23
  • 1
    I should link that xkcd.com/1171 =) Commented Aug 15, 2013 at 23:26
  • I was going to try that, but I was put off by this warning in the threading docs (I'm using CPython): In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously. Commented Aug 15, 2013 at 23:26

2 Answers 2

0

There are lighter solutions than multiprocessing to get asynchronism of task execution, like threads and coroutines. Though python2 is limited in its capability to run things simultaneously, python3 largely uses such asynchronous implementations within its fundamental types. Just run your code with python3 and you will see the difference:

$ python2 --version
Python 2.7.17
$ python2 -m timeit -n 1 -s "import rgxparallel as r" "r.serial_compile(r.l)"
1 loops, best of 3: 3.65 sec per loop
$ python -m timeit -n 1 -s "import multire as r" "r.parallel_compile(r.l)"
1 loops, best of 3: 3.95 sec per loop

$ python3 --version
Python 3.6.9
$ python3 -m timeit -n 1 -s "import multire as r" "r.serial_compile(r.l)"
1 loops, best of 3: 0.72 usec per loop
$ python3 -m timeit -n 1 -s "import multire as r" "r.parallel_compile(r.l)"
...
1 loops, best of 3: 4.51 msec per loop

Do not forget to change the xrange by range for the python3 version.

Sign up to request clarification or add additional context in comments.

Comments

-2

As much as I love python, I think the solution is, do it in perl (see this speed comparison, for example), or C, etc.

If you want to keep the main program in python, you could use subprocess to call a perl script (just make sure to pass as many values as possible in as few subprocess calls as possible to avoid overhead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.