4

We need the BSON equivalent to

{
    "Header": {
        "SubHeader1": {
            "Name": "Bond",
            "License": 7
        },
        "SubHeader2": {
            "IsActive": true
        }
    },
    "Payload": /* This will be a 40GB byte stream! */
}

But what we get is:

enter image description here

As you can see, the payload comes FIRST, and then the rest of the header!

We're using Json.NET's BSON writer (Bson.BsonWriter.WriteValue(byte[] value)), but it only accepts an actual byte[], not Stream. Since our payloads will be 10s of GB, we must use streams, so we've tried to work around (code below) but that gives us the incorrect result shown above

public void Expt()
{
    // Just some structure classes, defined below
    var fileStruct = new FileStructure();

    using (Stream outputSt = new FileStream("TestBinary.bson", FileMode.Create))
    {
        var serializer = new JsonSerializer();
        var bw = new BsonWriter(outputSt);

        // Start
        bw.WriteStartObject();

        // Write header            
        bw.WritePropertyName("Header");
        serializer.Serialize(bw, fileStruct.Header);

        // Write payload
        bw.WritePropertyName("Payload");
        bw.Flush(); // <== flush !                
        // In reality we 40GB into the stream, dummy example for now
        byte[] dummyPayload = Encoding.UTF8.GetBytes("This will be a 40GB byte stream!");
        outputSt.Write(dummyPayload, 0, dummyPayload.Length);

        // End
        bw.WriteEndObject();
    }    
}

This looks like the classic case of no synchronization / not flushing buffers despite us actually issuing a Flush to Json.NET before writing the payload to the underlying stream.

Question: Is there another way to do this? We'd rather not fork off Json.NET's source (and exploring it's internal piping) or re-invent the wheel somehow ...


Details: The supporting structure classes are (if you want to repro this)

public class FileStructure
{
    public TopHeader Header { get; set; }
    public byte[] Payload { get; set; }

    public FileStructure()
    {
        Header = new TopHeader
            {
                SubHeader1 = new SubHeader1 {Name = "Bond", License = 007},
                SubHeader2 = new SubHeader2 {IsActive = true}
            };
    }
}

public class TopHeader
{
    public SubHeader1 SubHeader1 { get; set; }
    public SubHeader2 SubHeader2 { get; set; }
}

public class SubHeader1
{
    public string Name { get; set; }
    public int License { get; set; }
}

public class SubHeader2
{
    public bool IsActive { get; set; }
}
3
  • BsonWriter writes data only at the end of objects (see BsonWriter.WriteEnd). Looks like you'll have to copy into your project and modify quite a bit of classes (BsonWriter, BsonBinaryWriter, the whole BsonToken hierarchy etc.) in order to implement writing streams, as they are not designed to be extensible. The feature looks quite useful, so I suggest modifying the code of the library and making a pull request. There will be some limitations, by the way; one requirement is that the stream needs to support telling its length. Commented Jun 13, 2013 at 10:42
  • What was your workaround going to be originally, with respect to the BSON specification requiring a 32-bit signed integer for describing the stream length (which would limit your max 'payload' size to 2 GB)? Commented Jul 9, 2013 at 18:22
  • 1
    @DuckMaestro: Hacking/Extending the spec. We contemplated hacking the first uint32 into a uint64 (4 bytes= > 8 bytes) OR assigning a special meaning to uint32 length=0 as "ignore the 1st 4 bytes and read the next 8 bytes/uint64". Fortunately, we didn't have to go down that route either since we adopted an alternative solution (see below). BSON is good but still has some room to grow for generality. Commented Jul 9, 2013 at 21:00

1 Answer 1

1

Ok, so we reached some middle ground here because we don't have the time (at the moment) to fix otherwise great Json.NET library. Since we're lucky to have the Stream only at the end, we're now using BSON for the header (small enough for a byte[]) and then passing it onto a standard stream writer i.e. the representation is:

{
    "SubHeader1": {
        "Name": "Bond",
        "License": 7
    },
    "SubHeader2": {
        "IsActive": true
    }
} /* End of valid BSON */
// <= Our Stream is written here, raw byte stream, no BSON

It would have been more aesthetic to have a uniform BSON layout but in the absence of it, this works great too. Probably a bit faster too! If someone still finds a better answer in the future, we're listening.

Sign up to request clarification or add additional context in comments.

1 Comment

To add, we got rid of BSON eventually and replaced it with a ProtoBuf style header. Using Marc Gavell's implementation, you can use something like Serializer.DeserializeWithLengthPrefix<YourHeader>(readSteam, PrefixStyle.Base128); ... just FYI

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.