I have the following implementation of problem 1-4 of the Matasano Cryptopals Challenge, to find one line in a file that is a text string xor'd with a single byte. It works well enough for large files, but displays "Stack space overflow: current size 8388608 bytes." for the file provided.
import System.IO
import System.Environment
import Control.Monad
import Data.Bits
import Data.Word
import Data.Maybe
import Data.List hiding (maximumBy)
import Data.Char
import Data.Ord
import Data.Foldable hiding (sum)
hexChars = "0123456789ABCDEF"
hexToBytes :: String -> Maybe [Word8]
{- Converts a hex string into a byte array -}
hexToBytes hexes = hexToBytes' (map toUpper hexes)
hexToBytes' (char1 : char2 : xs) = do
tail <- hexToBytes' xs
byte1 <- char1 `elemIndex` hexChars
byte2 <- char2 `elemIndex` hexChars
return ((fromIntegral(byte1*16 + byte2) :: Word8) : tail)
hexToBytes' [_] = Nothing
hexToBytes' [] = Just []
maxBy :: Ord b => Foldable f => (a -> b) -> f a -> a
maxBy = maximumBy . comparing
bytesToString :: Integral i => Monad m => m i -> m Char
bytesToString = liftM (chr . fromIntegral)
isLowercase x = (x >= 'a') && (x <= 'z')
asciiCheck :: Word8 -> Int
asciiCheck x = if (isLowercase . chr . fromIntegral) x then 1 else 0
score = (sum . map asciiCheck)
readLines :: Handle -> IO [String]
readLines handle = do
eof <- hIsEOF handle
if eof then
return []
else liftM2 (:) (hGetLine handle) (readLines handle)
decode key = map (xor key)
keys = [minBound ..] :: [Word8]
massDecode inputs =
maxBy score (liftM2 decode keys inputs)
main = do
hSetEncoding stdout latin1
args <- getArgs
handle <- case args of
[] -> return stdin
(x:xs) -> openFile x ReadMode
lines <- readLines handle
putStrLn $ bytesToString $ massDecode $ catMaybes $ map hexToBytes lines
The program works by traversing a list containing every input line xor'd with every possible key. I suspect that this large list is responsible for the overflow somehow, but I assumed that this would not cause memory issues because the list would be generated lazily. I don't think I have a firm enough understanding of when thunks are evaluated to intuit how this is causing a stack overflow.
So my question is: Why is generating or traversing this list causing a stack overflow?
foldl'will probably alleviate any space issues...-O2).