2

I have a dynamic programming algorithm and I find my Haskell implementation very satisfying, as it allows me to define the master array recursively, like so:

fill :: Int -> Int -> Entry
fill 0 0 = Entry Origin 0.0
fill 0 j = ...
fill i 0 = ...
fill i j = f m i j

m :: Array (Int, Int) Entry
m = listArray dims [fill i j | (i, j) <- range dims]

where f is a moderately complex function referring back to the main array m. The Entry type is just a Double with a small annotation.

The data itself is quite large, and m ends up having on the order of 100M entries. The algorithm is still blazing fast, but it uses ~25GB RAM in the process. From reading around I understand that this is because it keeps around the full computation of the array entries instead of the final value, but if I switched to unboxed arrays, I could not define them recursively like in the example above.

Is there a way to have a cake (low memory footprint) and eat it (recursive definition)?

9
  • 1
    According to your data, it consumes roughly 250B / entry. Perhaps you could try improving that by making your Entry type stricter, e.g. by adding strictness annotation and/or unpacking. You can't avoid the boxing of Entry itself and use an unboxed array, but you probably can remove boxing from the components of Entry. How is that type defined? Commented Dec 9, 2018 at 15:38
  • 2
    Are you definitely using the entire matrix for your computation, or will there be some unevaluated entries? A lot? Commented Dec 9, 2018 at 23:57
  • 2
    Would it be okay with you if you used 25GB temporarily -- e.g. during a precomputation phase at program startup or perhaps even before -- as long as the algorithm itself uses less during its run? Commented Dec 10, 2018 at 13:02
  • Thanks everyone for your replies. @chi data Entry = Entry Source Double, data Source = Origin | Left | Right | Top | Bottom. So super simple. I believe that unboxing the Double part could help, because that's the recursively defined part. @luqui I'd estimate that maybe half of the entries are actually used, and memory profiling supports that. @DanielWagner I'm curious what you have in mind. In principle, 25 GB at any stage is a bit heavy, since I'd like to eventually run this on smaller machines, but I'm definitely listening. Commented Dec 10, 2018 at 18:56
  • 1
    I'd definitely try data Entry = Entry {-# UNPACK #-} !Source {-# UNPACK #-} !Double for such a basic type. Don't forget to turn on optimizations with -O (and to avoid GHCi for measuring performance) Commented Dec 10, 2018 at 19:14

1 Answer 1

1

In the end, I got reasonable results with unboxed mutable array, like so:

import Data.Array.ST as UM
import Control.Monad (forM_)

mbounds = ((0,0), (..., ...))
m = UM.runSTUArray $ do
    arr <- UM.newArray_ mbounds
    let set = UM.writeArray arr
    let get = UM.readArray arr
    let fill (0, 0) = do set (0, 0) 0.0
        fill (i, 0) = do top <- get (i-1, 0)
                         set (i, 0) (f top)
        fill (0, j) = do ...
        fill (i, j) = do ...
    forM_ (UM.range mbounds) fill
    return arr

Still quite a bit more RAM gets hogged than I would like (~4GB), but it's a huge improvement and I suspect that after I re-run the profiler, I'm going to find ways to cut it down further.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.