I have a python script which has probably 100 or so regex lines each line matching certain words.
The script obviously consumes up to 100% cpu each time it is run (I basically pass a sentence to it and it will return any matched words found).
I want to combine these into around 4 or 5 different "compiled" regex parsers such as:
>>> words = ('hello', 'good\-bye', 'red', 'blue')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
How many words can I safely have in this and would it make a difference? Right now if I run a loop on a thousand random sentences it processes maybe 10 a second, looking to drastically increase this speed so it does like 500 a second (if possible).
Also, is it possible to a list like this?
>>> words = ('\d{4,4}\.\d{2,2}\.\d{2,2}', '\d{2,2}\s\d{2,2}\s\d{4,4}\.')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
>>> print pattern.findall("Today is 2010 11 08)