7

I am writing a program to parse .TGA files using Haskell - however, the performance is absolutely dreadful. A 2048 x 2048 pixel image takes a few seconds to parse.

I ran my code using +RTS -p -RTS and received the following interesting pieces from the report:

total time = 1.08 secs
total alloc = 3,037,568,120 bytes

COST CENTRE           MODULE    %time    %alloc
readPixelMap            Main     33.0      11.0
readPixelMap.getByte    Main     32.7      75.1
readPixelMap.getColor   Main     27.0      13.3

It appears that my program is allocating a huge amount of memory in the readPixelMap function. That function looks like this:

readPixelMap width height = do
    pixels <- replicateM height (replicateM width getColor)
    return $ PixMap pixels
    where getByte = liftM toInteger getWord8
          getColor = do (red:green:blue:alpha:xs) <- replicateM 4 getByte
                        return (red, green, blue, alpha)

and a PixMap is defined as

data PixMap = PixMap [[RGBAColor]] deriving (Show)
type RGBAColor = (Integer, Integer, Integer, Integer)

readPixelMap is called using the Get monad from Data.Binary:

main = do
    binary <- BS.readFile "test.tga"
    let (pixelMap, binary', nil) = runGetState (readPixelMap 2048 2048) binary 0

I've been working under the impression that a 2048 x 2048 .TGA image should load into a few hundred megabytes (at most) of memory. Is there an obvious way to improve my garbage collection times / allocation amount?

2 Answers 2

11

Your RGBColor type uses 5 machine words, 4 of which are pointers to each Integer. Each Integer consists of one machine word for GC/tagging, and either one additional word for the small representation, or the large representation consisting of a pointer to a byte array for the GMP data and a limb count.

Additionally, you're using a list of lists to store the values. Each non-empty list node is a word for GC/tagging, a word pointer to the value, and a word pointer to the tail.

To use less memory, use appropriate data types. Use unpacked values where possible. Use Word8 instead of Integer for 8-bit values. Use arrays, instead of lists. You know, the basics of acting like you care about memory use.

Take a look at http://johantibell.com/files/slides.pdf for some of the basics.

Sign up to request clarification or add additional context in comments.

Comments

2
  1. Using a tuple of Integer to store colour data (which is always 32 bits of data) doesn't make sense. Why not use any of the following variants: (Word8, Word8, Word8, Word8); data Colour = C Word8 Word8 Word8 Word; data Colour = C Word32

  2. A lazy ByteString is just a list of strict ByteString. So if you are reading the data byte by byte, you will essentially produce a list of strict ByteString whose length is 2048*2048. Not very efficient.

  3. The same can be said of using [[a]] to store a two dimensional array. In terms of memory performance, this is just about the worst datatype you could use. Try any one of these: Array (Int, Int) Color; UArray (Int, Int) Color; (Int, Int, Ptr Word8). My preference is the arrays from Data.Array.Repa which are high performance, have many built-in traversals, folds, etc., and naturally support parallelization. Another big advantage is Data.Array.Repa.Repr.ByteString.fromByteString will convert a strict ByteString into an array for further manipulation. The best part about this is you no longer have to worry about performance when reading the array; it has been done for you!

An example:

import Data.Array.Repa.Repr.ByteString
import Data.Array.Repa
import qualified Data.ByteString as B
import Data.Word (Word8)

data TGA = TGA (Array B DIM2 Word8)

readTGAFromFile :: FilePath -> (Int, Int) -> IO TGA
readTGAFromFile file (width,height) = do
  dat <- B.readFile file
  return $ TGA $ fromByteString (Z :. height :. width) dat

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.