2

I want to write a structured binary file in Haskell. For example, assume that the first four bytes should be "TEST" (as ASCII), followed by the numbers 1, 2, 3, 4, then 32 bytes each with the value 128 and then the number 2048 in Little Endian format.

That means, the created file (as hex) should look like this:

54 45 53 54 01 02 03 04 80 80 [... 30 bytes more ...] 00 08

So basically I have a custom data structure, let's say

data MyData = MyData {
  header :: String     -- "TEST"
  n1     :: Integer    -- 1
  n2     :: Integer    -- 2
  n3     :: Integer    -- 3
  block  :: [Integer]  -- 32 times 128
  offset :: Integer    -- 2048
}

Now I want to write this data to file. So basically I need to convert this structure into one long ByteString. I could not find out a clean idiomatic way to do this. Ideally I have a function

MyDataToByteString :: MyData -> ByteString

or

MyDataToPut :: MyData -> Put

but I could not find out how to create such a function.

Background information: I want to write a song in the impulse tracker format (http://schismtracker.org/wiki/ITTECH.TXT), which is the binary format for the schism tracker software.

Update 1

For the endian conversion, I guess I can just extract the individual Bytes as follows:

getByte :: Int -> Int -> Int
getByte b num = shift (num .&. bitMask b) (8-8*b)
  where bitMask b = sum $ map (2^) [8*b-8 .. 8*b-1]

2 Answers 2

3

The binary or cereal package provide easy ByteString serialization of data structures. Here's an example using binary

{-# LANGUAGE DeriveGeneric #-} 

import Data.Binary
import qualified Data.ByteString.Char8 as B8
import Data.Word8                  
import GHC.Generics

data MyData = MyData                  
    { header :: B8.ByteString               
    , n1     :: Int                 
    , n2     :: Int                  
    , n3     :: Int                 
    , n4     :: Int                 
    , block  :: [Word8]               
    , offset :: Int                 
    } deriving (Generic, Show)

instance Binary MyData

myData = MyData (B8.pack "Test") 1 2 3 4 (replicate 32 128) 2048

Notice I've changed the type of block to [Word8] as the question states that these are to be bytes.

We derive a GHC.Generics instance for MyData which allows the binary package to automatically generate a Binary instance. Now we can serialize our data to ByteString using encode, and deserialize with decode:

λ. let encoded = encode myData
λ. :t encoded
encoded :: Data.ByteString.Lazy.Internal.ByteString
λ. let decoded = decode encoded :: MyData
λ. decoded
MyData {header = "Test", n1 = 1, n2 = 2, n3 = 3, n4 = 4, block = [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128], offset = 2048}
Sign up to request clarification or add additional context in comments.

5 Comments

I cannot get your example to run. If I load your main file in ghci, I get Not in scope: type constructor or class 'ByteString'. If I add import qualified Data.ByteString as B and change ByteString to B.ByteString, I get Couldn't match expected type 'B.ByteString' with actual type '[Char]'
@Julian: I've edited my answer to include imports and proper String to ByteString conversion.
That helped a lot - but when I try to display or save the encoded ByteString, I still get a No instance nor default method for class operation Data.Binary.put. What am I missing?
If you're running this code in ghci make sure you're using :set -XDeriveGeneric
Doesn't change anything, same thing when I run with "runhaskell". The error occurs when I try to do BS.writeFile "test" encoded or BS.putStr encoded. I guess I am using the wrong ByteString module (I use import qualified Data.ByteString.Lazy as BS).
2

You could try this one (not optimal, I am afraid):

import qualified Data.List as L
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString as B

data MyData = MyData {
      header :: String     -- "TEST"
    , n1     :: Integer    -- 1
    , n2     :: Integer    -- 2
    , n3     :: Integer    -- 3
    , n4     :: Integer    -- 4
    , block  :: [Integer]  -- 32 times 128
    , offset :: [Integer]  -- 2048 in little endian
} deriving (Show)

d = MyData "TEST" 1 2 3 4 (L.replicate 32 128) [0, 8]

myDataToByteString :: MyData -> B.ByteString
myDataToByteString (MyData h n1 n2 n3 n4 b o) =
    B.concat [
          BC.pack h
        , B.pack (map fromIntegral [n1, n2, n3, n4])
        , B.pack (map fromIntegral b)
        , B.pack (map fromIntegral o)
    ]

3 Comments

That's great, but you cheated a little bit - you manually entered [0, 8] as the little endian representation for 2048. So how do I arrive there programmatically?
@Julian Oh, you can use functions of Data.Bits to do that, it should not be hard.
Thanks for the tip, I guess that will do the trick (see update in question).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.