haskell optimise code and stack overflow in tail recursion

Question

As I am trying to learn functional programming I have decided to do the advent of code challenges in haskell.

While doing challenge 5 https://adventofcode.com/2017/day/5 with input data https://adventofcode.com/2017/day/5/input I have encountered several problem.

This is my code

import Data.Array
import System.IO

listToArray l =
  let n_elem = length l
      pos_val = zip (range (0, n_elem)) l
  in array (0, n_elem-1) pos_val

getData filename = do
  s <- readFile filename
  let l = map read (lines s) ::[Int]
      a = listToArray l 
  return a

-- Part 1

updatePosArray i a =
  let i_val = a ! i
  in (i+i_val, a//[(i, i_val + 1)])

solution1 a n_steps i
 | i >= length a || i < 0 = n_steps
 | otherwise =
    let ai = updatePosArray i a
    in solution1 (snd ai) (n_steps+1) (fst ai)

-- Part 2

updatePosArray2 i a =
  let i_val = a ! i
  in
    if i_val>=3 then (i+i_val, a//[(i, i_val-1)])
    else (i+i_val, a//[(i, i_val+1)])

solution2 a n_steps i
 | i >= length a || i < 0 = n_steps
 | otherwise =
    let ai = updatePosArray2 i a
    in solution2 (snd ai) (n_steps+1) (fst ai)


main = do
  x <- getData "/Users/lucapuggini/Documents/AdventOfCode/data/data_ch5_p1.txt"
  let x_ex = array (0,4) [(0, 0), (1, 3), (2, 0), (3, 1), (4, -3)]

  let n_steps_ex1 = solution1 x_ex 0 0
  print $ n_steps_ex1

  let n_steps1 = solution1 x 0 0
  print $ n_steps1

  let n_steps_ex2 = solution2 x_ex 0 0
  print $ n_steps_ex2

  -- very slow. Probably due to the immutable array
  let n_steps2 = solution2 x 0 0
  print $ n_steps2

and this is the result I get :

lucas-MacBook-Pro:src lucapuggini$ stack runhaskell challenge5.hs

    5
    381680
    10
    stack overflow

the code is quiet slow but this is probably expected due to the fact that I am using immutable array but I am surprised by the stack overflow error. I thought that this should not happen with tail recursion.

In conclusion I have 2 questions:

1) Why am I getting stackoverflow error? Am I using tail recursion wrongly?

2) What is a more efficient but still functional way to run this code? Are immutable array a bad choice?

I am very new to haskell so please be clear.

The "stack" in Haskell is not a function call stack; that's why tail recursion isn't relevant. — melpomene
– melpomene, Commented Dec 23, 2017 at 10:03
this works lucas-MacBook-Pro:src lucapuggini$ stack ghc challenge5.hs [1 of 1] Compiling Main ( challenge5.hs, challenge5.o ) Linking challenge5 ... lucas-MacBook-Pro:src lucapuggini$ ./challenge5 5 381680 10 29717847 — Donbeo
– Donbeo, Commented Dec 23, 2017 at 11:01
What Haskell implementation are you using, and what version? I didn't think GHC had stack overflow errors anymore. — dfeuer
– dfeuer, Commented Dec 23, 2017 at 21:01
Also, yes, if you're writing code like this, it doesn't really make sense to use immutable arrays. Switch to mutable arrays, or use something with faster pure operations like Data.IntMap.Strict. — dfeuer
– dfeuer, Commented Dec 23, 2017 at 21:06

Will Ness · Accepted Answer · 2017-12-24 23:19:33Z

3

With respect to your first question (why the stack overflow):

Using stack runhaskell (or equivalently stack runghc) runs your code in a special "just in time" compilation mode, much the same way expressions entered at the GHCi prompt are run. The code is unoptimized and will frequently exhibit terrible performance characteristics.

For your particular program, this means it runs very slowly, with an ever-expanding memory footprint, and eventually produces a stack overflow.

If you instead compile and run with:

stack ghc -- -O2 challenge5.hs
./challenge5

you'll find that it runs much faster (about a minute on my laptop), in constant memory, and, obviously, without a stack overflow.

As indicated in the comments, a stack overflow error in GHC doesn't really have anything to do with tail recursion. Instead, it arises from a particular aspect of lazy evaluation. (See Do stack overflow errors occur in Haskell?, for example.)

Briefly, GHC creates "thunks" representing unevaluated expressions whose value can be demanded at a future point. Sometimes, these thunks get linked together in a long chain in such a way that when the value of the thunk at one end of the chain is needed, all the thunks down the chain need to be "partially evaluated" to the end of the chain to get the last thunk's value before the program can start calculating thunk values all the way back up the chain. GHC maintains all these "thunk evaluations in process" in a stack with finite size, and that can overflow.

A simple example to trigger a stack overflow is:

-- Sum.hs
main = print $ sum [1..100000000]

If you run this with:

stack runhaskell -- Sum.hs      # using GHC version 8.0.2

you'll get:

Sum.hs: stack overflow

However, compiling it with ghc -O2 is enough to make the problem go away (both for Sum.hs and in your original program). One of the reasons is likely the application of a "strictness analysis" optimization that forces thunks early so these long chains can't form.

With respect to your second question (is an immutable array the right approach):

As @WillNess points out, using unboxed arrays in place of boxed arrays gives a huge performance improvement: on my laptop, an unboxed version of your code runs in 8 secs versus 63 secs.

However, with this type of algorithm -- basically, where a large number of small changes are made incrementally to a vector in such a way that the changes to be made depend on the whole history of accumulated previous changes -- you can do much better with a mutable array. I have a version using Data.Vector.Unboxed.Mutable that runs part 2 in 0.12 secs, and you should be able to achieve similar performance with a mutable unboxed array from Data.Array.Unboxed.

edited Dec 24, 2017 at 23:19

Will Ness

71.6k10 gold badges105 silver badges192 bronze badges

answered Dec 24, 2017 at 19:31

K. A. Buhr

52.7k3 gold badges56 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Donbeo Over a year ago

Mutable array seems to be the most reasonable choice. But I thought that mutable structure should be avoided in functional programming

K. A. Buhr Over a year ago

Those who subscribe to this view would actually say that mutable structures should be avoided in all programming. It's just easier in functional languages. The argument (which I won't make here!) is that immutable algorithms lead to cleaner design and fewer bugs, and the potential inefficiencies of copying data to make small changes can usually be optimized away by careful data structure design and clever compilers. However, some algorithms are still more easily implemented using mutable data structures, so we settle with using them "sparingly".

Will Ness Over a year ago

the problem is with the implementation, whereas for the unboxed immutable array a, a // [(i,e)] in tail position should be O(1), but isn't.

Will Ness · Accepted Answer · 2017-12-24 23:25:16Z

1

If you import Data.Array.Unboxed instead of Data.Array, and declare your array as

    ....
        a :: UArray Int Int 
        a = listToArray l
    return a

or something, in getData, you'll get significant speedup, bordering on miraculous.

Also, you will have to re-implement lengthArr = rangeSize . bounds, so it works for the unboxed arrays.

edited Dec 24, 2017 at 23:25

answered Dec 24, 2017 at 10:47

Will Ness

71.6k10 gold badges105 silver badges192 bronze badges

Collectives™ on Stack Overflow

haskell optimise code and stack overflow in tail recursion

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related