I have a script that analyzes javascript in links, recursively, so if it finds a javascript, then it analyzes the javascript, and if the javascript it's analyzing contains more javascript then it keeps going, so on so forth. However, I've encountered issues where this recursion would never stop, is there a way to add a timeout for this recursion?
4 Answers
Python has a built-in recursion limit and will raise a RuntimeError if this is exceeded. By default its stack limit is 1000. So:
try:
func_that_may_recurse_infinitely() # i.e., your JavaScript crawler func
except RuntimeError as e:
if "recursion" in str(e):
print "stop all the downloadin'!"
You may modify the initial recursion limit with sys.setrecursionlimit() if you need it to be deeper or shallower.
However, a better approach may be to keep a set() of the items you've already seen and simply decline to process any you've already processed. This can keep you from getting into recursive situations in the first place.
5 Comments
I tend to agree with kindall. If, however, you did want to limit the depth of recursion, you could do something like this:
def foo(max_depth = 10, cur_depth=0):
if cur_depth >= max_depth:
return BASE_CASE
else:
return foo(max_depth, cur_depth+1)
1 Comment
A quick&dirty solution is to just call time.time() and compare. For example, let's say you've got a simple factorial function like this:
def fact(i):
if i == 0 or i == 1: return 1
return i * fact(i-1)
If you call fact(-1), this will spin for a while and then raise a RuntimeError because of maximum recursion depth.
You can add a timeout like this:
import time
def factWithTimeout(i, timeout):
def fact(i, endtime):
if time.time() > endtime:
raise RuntimeError('Timeout')
if i == 0 or i == 1: return 1
return i * fact(i-1, endtime)
return fact(i, time.time() + timeout)
Now, if you call factWithTimeout(-1, 0.0001), it'll only spin for about 100us and then quit with RuntimeError because of timeout.
Obviously, for a function so trivial that it hits the recursion limit in under a millisecond, this isn't much different, but for a more realistic function, that won't be an issue.
Comments
You could do something like this:
import time
start = time.time()
timeout_limit = 30 # 30 seconds, or some other number.
def foo():
if time.time() > start + timeout_limit:
return 0
# insert code here.
foo()
...and if you don't want global variables, you could try this:
class Foo(object):
def __init__(self, timeout_limit):
self.timeout_limit = timeout_limit
def run(self, ...):
self.start = time.time()
self._run(...)
def _run(self, ...):
if time.time() > self.start + self.timeout_limit:
return
# insert code here.
self._run(...)
...although that might be overkill.