3

I have the following implementation of problem 1-4 of the Matasano Cryptopals Challenge, to find one line in a file that is a text string xor'd with a single byte. It works well enough for large files, but displays "Stack space overflow: current size 8388608 bytes." for the file provided.

import System.IO
import System.Environment
import Control.Monad
import Data.Bits
import Data.Word
import Data.Maybe
import Data.List hiding (maximumBy)
import Data.Char
import Data.Ord
import Data.Foldable hiding (sum)


hexChars = "0123456789ABCDEF"

hexToBytes :: String -> Maybe [Word8]
{- Converts a hex string into a byte array -}

hexToBytes hexes = hexToBytes' (map toUpper hexes)
hexToBytes' (char1 : char2 : xs) = do
    tail <- hexToBytes' xs
    byte1 <- char1 `elemIndex` hexChars
    byte2 <- char2 `elemIndex` hexChars
    return ((fromIntegral(byte1*16 + byte2) :: Word8) : tail)
hexToBytes' [_] = Nothing
hexToBytes' [] = Just []

maxBy :: Ord b => Foldable f => (a -> b) -> f a -> a
maxBy = maximumBy . comparing

bytesToString :: Integral i => Monad m => m i -> m Char
bytesToString = liftM (chr . fromIntegral)  

isLowercase x = (x >= 'a') && (x <= 'z')

asciiCheck :: Word8 -> Int
asciiCheck x = if (isLowercase . chr . fromIntegral) x then 1 else 0

score = (sum . map asciiCheck)

readLines :: Handle -> IO [String]
readLines handle = do
    eof <- hIsEOF handle
    if eof then
        return []
    else liftM2 (:) (hGetLine handle) (readLines handle)

decode key = map (xor key)

keys = [minBound ..] :: [Word8]

massDecode inputs =
    maxBy score (liftM2 decode keys inputs)

main = do
    hSetEncoding stdout latin1
    args <- getArgs
    handle <- case args of
        [] -> return stdin
        (x:xs) -> openFile x ReadMode
    lines <- readLines handle
    putStrLn $ bytesToString $ massDecode $ catMaybes $ map hexToBytes lines

The program works by traversing a list containing every input line xor'd with every possible key. I suspect that this large list is responsible for the overflow somehow, but I assumed that this would not cause memory issues because the list would be generated lazily. I don't think I have a firm enough understanding of when thunks are evaluated to intuit how this is causing a stack overflow.

So my question is: Why is generating or traversing this list causing a stack overflow?

7
  • 3
    yes your code would help - till then maybe you can have a look at this Wiki article dealing with the different folds and their properties Commented Aug 19, 2015 at 19:32
  • foldl' will probably alleviate any space issues... Commented Aug 19, 2015 at 19:46
  • 1
    Also, be sure you're compiling with optimization (-O2). Commented Aug 19, 2015 at 19:54
  • @recursion.ninja: I'm using maximumBy for the folding. Commented Aug 19, 2015 at 21:18
  • 1
    Well since someone deleted my answer (admittedly wasn't fleshed out but I was at work and didn't have time to finish it out), there is a good explanation of the issue you're running into here Commented Aug 20, 2015 at 2:34

1 Answer 1

1

As discussed in this thread, the maximumBy on list that we now have in the prelude isn't optimally performant, for a variety of, well, let's just say "reasons."

If you use the proposed maximumBy' function, which is strict, you'll likely fix the leak:

foldl1' f l = foldl' f (head l) (tail l)

maximumBy' f = foldl1' max'
  where max' x y = case f x y of
                        GT -> y
                        _  -> x
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.