2

I'm doing some of the Project Euler projects (not as homework, just for fun/learning), and I'm learning Haskell. One of the problems was to find the largest Collatz Sequence with the starting number under 1 million (http://projecteuler.net/problem=14)

So anyhow, I was able to do it, and my algorithm works and gets the right answer rather rapidly when compiled. However, it uses a 1000000 depth recursion.

So my question is: did I do this right? As is, is the the proper Haskell way to do it? How could I make it faster? Also, with memory usage, how is the recursion actually implemented on the low-level? As in how does it use memory?

(SPOILER ALERT: Don't look at this if you want to solve Project Euler's problem #14 on your own without looking at the answer.)

--haskell script --problem: find the longest collatz chain for a number less than 2 million.

collatzLength x| x == 1 = 1
               | otherwise = 1 + collatzLength(nextStep x)


longestChain (num, numLength) bound counter
           | counter >= bound = (num, numLength)
           | otherwise = longestChain (longerOf (num,numLength)
             (counter,   (collatzLength counter)) ) bound (counter + 1)
           --I know this is a messy function, but I was doing this problem just 
           --for myself, so I didn't bother making some utility functions for it.
           --also, I split the big line in half to display on here nicer, would
           --it actually run with this line split?


longerOf (a1,a2) (b1,b2)| a2 > b2 = (a1,a2)
                        | otherwise = (b1,b2)

nextStep n | mod n 2 == 0 = (n `div` 2)
           | otherwise = 3*n + 1

main = print (longestChain (0,0) 1000000 1)

The program runs in about 7.5 seconds when compiled with -O2.

So, any advice/suggestions? I want to try to get the program to run faster with less memory usage, and I want to do it in a very Haskellian (that should be a word) way.

Thanks in advance!

2
  • I would try making collatzLength iterative (tail recursive). Right now you build the result using recursion 1 + (1 + (1 + ...)). Apply the same tail recursion technique that you have for longestChain. Commented Jun 17, 2013 at 19:41
  • hpaste.org/90090 Commented Jun 18, 2013 at 13:14

1 Answer 1

8

Edit to Answer Questions

did I do this right?

Almost, as the comments said, you build a big thunk of 1+(1+(1+...)) - use a strict accumulator instead or a higher-level function that handles things for you. There are other minor things like defining a function to compare the second element instead of using maximumBy (comparing snd) but that's more stylistic.

As is, is the the proper Haskell way to do it?

It is acceptably idiomatic Haskell code.

How could I make it faster?

See my below benchmarks. The extremely common answers for Euler performance questions are:

  • Use -O2 (as you did)
  • Try -fllvm (GHC NCG is sub-optimal)
  • Use worker/wrappers to reduce arguments or, in your case, leverage an accumulator.
  • Use fast/unboxable types (Int when you can instead Integer, Int64 if needed for portability, etc).
  • Use rem instead of mod when all the values are positive. For your case it is also useful to know or discover that div tends to compile to something that is slower than quot.

Also, with memory usage, how is the recursion actually implemented on the low-level? As in how does it use memory?

Both these questions are very broad. Complete answers would likey need to address lazy evaluation, tail call optimization, worker transformations, garbage collection, etc. I suggest you explore these answers in more depth over time (or hope someone here makes the complete answer I'm avoiding).

Original Post - Benchmark numbers

Original:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m5.971s
user    0m5.940s
sys 0m0.019s

Use a worker function with an accumulator for collatzLength:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m5.617s
user    0m5.590s
sys 0m0.012s

Use Int and not defaulting to Integer - it's also easier to read with type signatures!

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m2.937s
user    0m2.932s
sys 0m0.001s

Use rem and not mod:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m2.436s
user    0m2.431s
sys 0m0.001s

Use quotRem and not rem then div:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m1.672s
user    0m1.669s
sys 0m0.002s

This is all very much like a previous question: Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell

EDIT: and yes, as Daniel Fischer suggests, using bit ops of .&. and shiftR improves on quotRem:

$ ghc -O2 so.hs ; time ./so
(837799,525)

real    0m0.314s
user    0m0.312s
sys 0m0.001s

Or you can just use LLVM and let it do it's magic (NB this version uses quotRem still)

$ time ./so
(837799,525)

real    0m0.286s
user    0m0.283s
sys 0m0.002s

LLVM actually does well, so long as you avoid the hideousness that is mod, and optimizes the guard-based code using either rem or even equally well as the hand-optimized .&. with shiftR.

For a result that is around 20x faster than the original.

EDIT: People are surprised that quotRem performs as well as bit manipulation in the face of Int. The code is included, but I'm not clear on the surprise: just because something is possibly negative doesn't mean you can't handle it with very similar bit manipulations that could be of identical cost on the right hardware. All three versions of nextStep seem to be performing identically (ghc -O2 -fforce-recomp -fllvm, ghc version 7.6.3, LLVM 3.3, x86-64).

{-# LANGUAGE BangPatterns, UnboxedTuples #-}

import Data.Bits

collatzLength :: Int -> Int
collatzLength x| x == 1    = 1
               | otherwise = go x 0
 where
    go 1 a  = a + 1
    go x !a = go (nextStep x) (a+1)


longestChain :: (Int, Int) -> Int -> Int -> (Int,Int)
longestChain (num, numLength) bound !counter
   | counter >= bound = (num, numLength)
   | otherwise = longestChain (longerOf (num,numLength) (counter, collatzLength counter)) bound (counter + 1)
           --I know this is a messy function, but I was doing this problem just 
           --for myself, so I didn't bother making some utility functions for it.
           --also, I split the big line in half to display on here nicer, would
           --it actually run with this line split?


longerOf :: (Int,Int) -> (Int,Int) -> (Int,Int)
longerOf (a1,a2) (b1,b2)| a2 > b2 = (a1,a2)
                        | otherwise = (b1,b2)
{-# INLINE longerOf #-}

nextStep :: Int -> Int
-- Version 'bits'
nextStep n = if 0 == n .&. 1 then n `shiftR` 1 else 3*n+1
-- Version 'quotRem'
-- nextStep n = let (q,r) = quotRem n 2 in if r == 0 then q else 3*n+1
-- Version 'almost the original'
-- nextStep n | even n = quot n 2
--            | otherwise  = 3*n + 1
{-# INLINE nextStep #-}


main = print (longestChain (0,0) 1000000 1)
Sign up to request clarification or add additional context in comments.

10 Comments

Now use n .&. 1 == 0 and n `shiftR` 1 instead of quotRem.
@DanielFischer I read many times that a decent compiler should do such an optimization itself. Do you know how it is with GHC (perhaps with the LLVM backend)? Does using explicit bit shift operations really help?
@PetrPudlák GHC has not yet been taught many of such optimisations, too few people working on it, their time is spent with higher-level optimisations and type system trickery, it does not not do that by itself (so far). The LLVM backend does that optimisation when you use the right type (and last I checked, only for quot/rem, not for div/mod). But of course, with Int, it has to take into account the possibility of negative numbers, so you can beat it (not by much) using the domain-knowledge that only positive numbers occur [unless Int is 32 bits, in which case you have overflow].
At least on my box, even with the LLVM backend, simply shifting beats the LLVM optimisation (by a surprisingly large margin, ~25%), since LLVM doesn't know that no negative numbers occur. The difference disappears when using Word.
@Satvik they are indistinguishable. I assume LLVM recognizes the assembly equivalent of quotRem x 2 and rewrites it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.