1

I have a script that analyzes javascript in links, recursively, so if it finds a javascript, then it analyzes the javascript, and if the javascript it's analyzing contains more javascript then it keeps going, so on so forth. However, I've encountered issues where this recursion would never stop, is there a way to add a timeout for this recursion?

1
  • Care to share some code? Commented Jul 17, 2012 at 18:00

4 Answers 4

6

Python has a built-in recursion limit and will raise a RuntimeError if this is exceeded. By default its stack limit is 1000. So:

try:
    func_that_may_recurse_infinitely()   # i.e., your JavaScript crawler func
except RuntimeError as e:
    if "recursion" in str(e):
        print "stop all the downloadin'!"

You may modify the initial recursion limit with sys.setrecursionlimit() if you need it to be deeper or shallower.

However, a better approach may be to keep a set() of the items you've already seen and simply decline to process any you've already processed. This can keep you from getting into recursive situations in the first place.

Sign up to request clarification or add additional context in comments.

5 Comments

Oh, this is very useful. But I think for my purposes I would like the depth version more. Thank you!
The memoization idea is neat, but it won't necessarily help here. If the cause of the infinite recursion is that the JS is actually evaluating to itself, it's fine; if the cause of the recursion is that the JS is evaluating to a slightly longer string of JS each time, it'll still go forever.
Hence the "may." I don't know that the "analysis" consists of so I don't know how practical it is for the problem at hand.
@abarnert yes, that is exactly the problem, the js in the particular sample I was working on refers to each other, causing an endless recursion.
Well, if the first one evals to exactly the second, and the second to exactly the first, then the trick kindall suggests will be able to detect the loop and bail out after 2 steps instead of having to go through the loop N times or N seconds—not just saving lots of time and CPU, but also meaning you can be sure it really is an infinite loop, rather than just a finite one that's a hair too deep for your limits.
2

I tend to agree with kindall. If, however, you did want to limit the depth of recursion, you could do something like this:

def foo(max_depth = 10, cur_depth=0):
    if cur_depth >= max_depth:
        return BASE_CASE

    else:
        return foo(max_depth, cur_depth+1)

1 Comment

Oh question, the if depth >=max_depth, I assume you mean cur_depth?
2

A quick&dirty solution is to just call time.time() and compare. For example, let's say you've got a simple factorial function like this:

def fact(i):
  if i == 0 or i == 1: return 1
  return i * fact(i-1)

If you call fact(-1), this will spin for a while and then raise a RuntimeError because of maximum recursion depth.

You can add a timeout like this:

import time

def factWithTimeout(i, timeout):
  def fact(i, endtime):
    if time.time() > endtime:
      raise RuntimeError('Timeout')
    if i == 0 or i == 1: return 1
    return i * fact(i-1, endtime)
  return fact(i, time.time() + timeout)

Now, if you call factWithTimeout(-1, 0.0001), it'll only spin for about 100us and then quit with RuntimeError because of timeout.

Obviously, for a function so trivial that it hits the recursion limit in under a millisecond, this isn't much different, but for a more realistic function, that won't be an issue.

Comments

1

You could do something like this:

import time

start = time.time()
timeout_limit = 30   # 30 seconds, or some other number.

def foo():
    if time.time() > start + timeout_limit:
        return 0

    # insert code here.

    foo()

...and if you don't want global variables, you could try this:

class Foo(object):
    def __init__(self, timeout_limit):
        self.timeout_limit = timeout_limit

    def run(self, ...):
        self.start = time.time()
        self._run(...)

    def _run(self, ...):
        if time.time() > self.start + self.timeout_limit:
            return

        # insert code here.

        self._run(...)

...although that might be overkill.

2 Comments

Yea that's slightly more complicated than what I bargained for.
In this case, I think it makes more sense to use a local function and a closure (as in my answer) than an object, because the state is so trivial, and there's no need for an interface with multiple methods. But it is more explicit this way, especially to people who have more OO than FP background.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.