As a pure functional programming language, Haskell intensively uses recursion. Do stack overflow errors occur in Haskell, like in Java? Why, or why not?
2 Answers
Haskell uses the stack differently from Java, due to laziness.
In Java, a stack frame is created when a method is called, and freed when the method returns. So if f() is a recursive method, each recursive call to f() generates a stack frame, and these frames are strictly nested. You can get a stack overflow when you have a deep chain of recursive calls, like f() -> f() -> f() -> ….
Whereas in Haskell, a thunk is created when a function is called. A stack frame is created when a thunk is forced using pattern-matching (e.g. case), and freed when evaluation of the thunk is complete enough to return a value (which may contain more unevaluated thunks).
So if f is a recursive function, each call to f generates a thunk, and case on the result of this generates a stack frame, but these frames are only nested when there’s a dependency between thunks. And in fact, that’s what the seq primitive does: a `seq` b means “evaluate a before b, returning b”, but you can also think of it as adding a dependency of b on a, so when b is evaluated, a is also forced.
You can get a stack overflow when you have a deep chain of thunks to evaluate, for example in the excessively lazy foldl function:
foldl (+) 0 [1..5]
==
foldl (+) 0 (1 : 2 : 3 : 4 : 5 : [])
==
((((0 + 1) + 2) + 3) + 4) + 5
This generates a chain of thunks like so:
((+)
((+)
((+)
((+)
((+)
0
1)
2)
3)
4)
5)
When we force the result (for example, by printing it) we need to descend all the way down this chain in order to be able to start evaluating it, at the (+) 0 1 thunk.
So foldl often produces stack overflows for large inputs, which is why most of the time you should use foldl' (which is strict) when you want a left-associative fold. Instead of building a chain of nested thunks, foldl' evaluates the intermediate results immediately (0+1 = 1, 1+2 = 3, 3+3 = 6, …).
3 Comments
foldl (+) is the fault of both + and foldl; because + is fully strict, the outermost + think can't do anything without forcing the thunk in one of its arguments, and so on along the whole chain. But if the operation you gave to foldl was instead building a data structure with more structure than an integer, it might return a constructor without forcing the entire chain. You'd still get as many thunks as list cells since foldl doesn't generate the outermost one to return until it's consumed the entire list, but they wouldn't have to blow the stack.+ in terms of how productive it is, rather than in terms of strictness (though maybe they are one and the same here), 2) + can be productive if we're dealing with peano numbers, say :-) 3) also worth pointing out that in that case (where you're passing a productive function, rather than + on Ints) foldr is still better (if it's possible to use) since you then avoid the one-thunk-per-cell you mentionfoldr and foldl is one of semantics, not just efficiency. Generally I'd prefer to choose based on that first, and only jump through hoops trying to fit foldr where foldlis more natural if it turns out I need to, rather than have "foldr is better " as a blanket rule. Most of the time it's an unnecessary micro optimisation (occasionally very necessary to get something that runs acceptably at all).Stack overflows won't occur in Haskell like they would in Java, etc., because evaluation and function calls happen differently.
In Java and C and other similar languages, function calls are implemented using a call stack. When you call a function, all of the function's local variables are allocated onto a call stack, and when the function finishes, the function is popped back off of the stack. Too many nested calls, and the call stack will overflow.
In Haskell, that's not necessarily how function calls work. In most compilers, like GHC, Haskell function calls are not implemented using call stacks. They are implemented using a completely different process, using allocation of thunks on the heap.
So, most haskell implementations don't implement a call stack for function calls in the first place, so the idea of a stack overflow is non-sensical. It'd be like talking about bath tubs overflowing in a locker room with only showers.
(GHC does use a call stack for thunk evaluations, but not for function calls. So stack overflows that might happen are of a completely different and unrelated nature than stack overflows from Java, C, etc.)
foldrwith tail recursion is horrible, even when disregarding infinite lists.