Slow array access in haskell?

Question

I'm doing this Car Game problem on Kattis: https://open.kattis.com/problems/cargame There's a five-second time limit, but on the last instance, my code requires longer to run. I'm fairly sure I'm doing the right thing (from a big-O standpoint) so now I need to optimize it somehow. I downloaded the test data from: http://challenge.csc.kth.se/2013/challenge-2013.tar.bz2

From profiling, it seems like most of the running time is spent in containsSub which is nothing more than an array access together with a tail-recursive call. Furthermore, it's only called about 100M times, so why does it take 6.5 seconds to run (6.5 s on my laptop. I've found Kattis is generally about twice as slow, so probably more like 13 seconds). On the statistics page, some of the C++ solutions run in under a second. Even some python solutions just barely make it under the 5-second bar.

module Main where

import           Control.Monad
import           Data.Array            (Array, (!), (//))
import qualified Data.Array            as Array
import           Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as BS
import           Data.Char
import           Data.List
import           Data.Maybe

main::IO()
main = do
  [n, m] <- readIntsLn
  dictWords <- replicateM n BS.getLine
  let suffixChains = map (\w -> (w, buildChain w)) dictWords
  replicateM_ m $ findChain suffixChains

noWordMsg :: ByteString
noWordMsg = BS.pack "No valid word"

findChain :: [(ByteString, WordChain)] -> IO ()
findChain suffixChains = do
  chrs <- liftM (BS.map toLower) BS.getLine
  BS.putStrLn
    (
      case find (containsSub chrs . snd) suffixChains of
        Nothing -> noWordMsg
        Just (w, _) -> w
    )

readAsInt :: BS.ByteString -> Int
readAsInt = fst . fromJust . BS.readInt

readIntsLn :: IO [Int]
readIntsLn = liftM (map readAsInt . BS.words) BS.getLine

data WordChain = None | Rest (Array Char WordChain)

emptyChars :: WordChain
emptyChars = Rest . Array.listArray ('a', 'z') $ repeat None

buildChain :: ByteString -> WordChain
buildChain s =
  case BS.uncons s of
    Nothing -> emptyChars
    Just (hd, tl) ->
      let wc@(Rest m) = buildChain tl in
      Rest $ m // [(hd, wc)]

containsSub :: ByteString -> WordChain -> Bool
containsSub _ None = False
containsSub s (Rest m) =
  case BS.uncons s of
    Nothing -> True
    Just (hd, tl) -> containsSub tl (m ! hd)

EDIT: TAKE 2:

I tried building a lazy trie to avoid searching things I'd already searched. So for instance, if I've already encountered a triplet beginning with 'a', then in the future I can skip anything which doesn't contain an 'a'. If I've already searched a triplet beginning 'ab', I can skip anything which doesn't contain 'ab'. And if I've already searched the exact triplet 'abc', I can just return the same result from last time. In theory, this should contribute a significant speedup. In practice the running time is identical.

Furthermore, without the seq's, profiling takes forever and gives bogus results (I couldn't guess why). With the seqs, profiling says the bulk of the time is spent in forLetter (which is where the array accesses have been moved to so again it looks like array access is the slow part)

{-# LANGUAGE TupleSections #-}

module Main where

import           Control.Monad
import           Data.Array            (Array, (!), (//))
import qualified Data.Array            as Array
import qualified Data.Array.Base       as Base
import           Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as BS
import           Data.Char
import           Data.Functor
import           Data.Maybe

main::IO()
main = do
  [n, m] <- readIntsLn
  dictWords <- replicateM n BS.getLine
  let suffixChainsL = map (\w -> (w, buildChain w)) dictWords
  let suffixChains = foldr seq suffixChainsL suffixChainsL
  suffixChains `seq` doProbs m suffixChains

noWordMsg :: ByteString
noWordMsg = BS.pack "No valid word"

doProbs :: Int -> [(ByteString, WordChain)] -> IO ()
doProbs m chains = replicateM_ m doProb
  where
    cf = findChain chains
    doProb =
      do
        chrs <- liftM (map toLower) getLine
        BS.putStrLn . fromMaybe noWordMsg $ cf chrs

findChain :: [(ByteString, WordChain)] -> String -> Maybe ByteString
findChain [] = const Nothing
findChain suffixChains@(shd : _) = doFind
  where
    letterMap :: Array Char (String -> Maybe ByteString)
    letterMap =
      Array.listArray ('a','z')
        [findChain (mapMaybe (forLetter hd) suffixChains) | hd <- [0..25]]
    endRes = Just $ fst shd
    doFind :: String -> Maybe ByteString
    doFind [] = endRes
    doFind (hd : tl) = (letterMap ! hd) tl
    forLetter :: Int -> (ByteString, WordChain) -> Maybe (ByteString, WordChain)
    forLetter c (s, WC wc) = (s,) <$> wc `Base.unsafeAt` c

readAsInt :: BS.ByteString -> Int
readAsInt = fst . fromJust . BS.readInt

readIntsLn :: IO [Int]
readIntsLn = liftM (map readAsInt . BS.words) BS.getLine

newtype WordChain = WC (Array Char (Maybe WordChain))

emptyChars :: WordChain
emptyChars = WC . Array.listArray ('a', 'z') $ repeat Nothing

buildChain :: ByteString -> WordChain
buildChain = BS.foldr helper emptyChars
  where
    helper :: Char -> WordChain -> WordChain
    helper hd wc@(WC m) = m `seq` WC (m // [(hd, Just wc)])

The profiling might attribute cost to create the WordChain to the creation is actually forced (in containsSub), due to lazyness. Just a guess, though. — Joachim Breitner
– Joachim Breitner, Commented Dec 3, 2014 at 18:34
I tried adding seq's to ensure that the wordchains are all created before reading the list of subsequences. The vast bulk of the work is still being done in containsSub. This is reasonable since containsSub is called (in the worst case) 5000*10000*3 times whereas only 5000*100 WordChains are created — dspyz
– dspyz, Commented Dec 3, 2014 at 18:48
I got this from 3.7s to 2.5s (for the 2nd sample problem) by changing the datatype to data WordChain = None | Rest {-# UNPACK #-} !(Array Int WordChain); using m `unsafeAt` (fromIntegral hd - 97) instead of ! and unsafeReplace m [(fromIntegral hd - 97, wc)] instead of //. GC time is 25% so you can probably squeeze out a bit more by getting the GC to not run by not constructing intermediate bytestrings. — user2407038
– user2407038, Commented Dec 3, 2014 at 22:28
I don't know if this will help, because I don't know if it's really near the problem, but buildChain appears to have the structure of a fold. Using a fold function may help speed things up a bit over unconsing, or at least make the code easier to read. — dfeuer
– dfeuer, Commented Dec 4, 2014 at 5:39

ErikR · Accepted Answer · 2014-12-03 18:39:08Z

2

The uncons call in containsSub creates a new ByteString. Try speeding it up by keeping track of the offset into the string with an index, e.g.:

containsSub' :: ByteString -> WordChain -> Bool
containsSub' str wc = go 0 wc
  where len = BS.length str
        go _ None = False
        go i (Rest m) | i >= len = True
                      | otherwise = go (i+1) (m ! BS.index str i)

answered Dec 3, 2014 at 18:39

ErikR

52.2k9 gold badges78 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dspyz Over a year ago

I tried this and it shaved off about half a second. Also, according to hackage.haskell.org/package/bytestring-0.10.4.1/docs/… uncons is an O(1) operation. I assume that means it's reusing the memory for the tail of the string rather than allocating a new one

ErikR Over a year ago

It is reusing the memory for the tail of the string, but a ByteString value is a length + a pointer to the chars, and that is the structure uncons has to create.

willy-s Over a year ago

If the array access is limiting you could try eliminating bounds checking by replacing ! with unsafeAt from Data.Array.Base.

dspyz · Accepted Answer · 2014-12-07 19:11:39Z

1

After much discussion on the #haskell and #ghc IRC channels, I found that the problem was related to this ghc bug: https://ghc.haskell.org/trac/ghc/ticket/1168

The solution was simply to change the definition of doProbs

doProbs m chains = cf `seq` replicateM_ m doProb
...

Or just to compile with -fno-state-hack

ghc's state hack optimization was causing it to unnecessarily recompute cf (and the associated letterMap) on every call.

So it has nothing to do with array accesses.

answered Dec 7, 2014 at 19:11

dspyz

5,6042 gold badges31 silver badges71 bronze badges

Collectives™ on Stack Overflow

Slow array access in haskell?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related