1

I'm doing the below in Python

I have a problem that involves checking a large list(size n) of input strings to see if the substrings contain any words from a large dictionary(size m).

I've looked around for efficient algorithms for this problem and found these: https://github.com/laurentluce/python-algorithms/blob/master/algorithms/string_matching.py

the Rabin-Karp and KMP matching algorithms.(note that I've replaced the ord() function for the Rabin-Karp with a dictionary for efficiency)

However, these actually perform slower than using an 'in' operation in Python which uses the Boyer–Moore–Horspool algorithm. I suppose that this is because the contains() method invoked by 'in' was is implemented in C.

How can I override this method with the Rabin-Karp for the string class in Python in C?

2 Answers 2

2

You could have a look at cython:

http://docs.cython.org/src/quickstart/cythonize.html

I find it easier to write some custom code for a specific operation than overriding a core python structure.

Sign up to request clarification or add additional context in comments.

2 Comments

And of course it might be worth trying PyPy, which offers significant speedups against standard CPython
Thx for the update, I didn't use CPython in a long time, but a quick google search shows things moved quite a bit. PyPy seems the way to go !
1

You can't, I am sorry to say.

The built-in types, being constructed at the time the interpreter is compiled, cannot be patched at run-time. If speed is really so important then you might want to write a C extension type that subclasses the built-in string type but with a different contains method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.