2

I understand converting from binary to string is a slow and expensive operation.

Our project requires, modelled closely on a Big Data solution, will take an incoming stream of binary data.

Most Big Data solutions suggest using a NO SQL database, like Mongo or Raven and parse the data into JSON for easier querying later... This is where I'm confused. I assume it's only to JSON for unstructured data.

The incoming data is already semi structured but ignoring that, if I convert it to JSON then surely that is a binary to string conversion and I'll incur the penalty of the delay whilst this occurs? And if I want to query the int values, which are now string, I've got to convert them back again.

I understand that JSON will map the values which is possibly faster than converting.

If I convert binary data into JSON for easier dealings (readability/maintenance etc) but then need to query numeric values (which were initially query able in binary format) then surely I'm doing this conversion twice (binary to string and then string back to binary)? Or when we query number values in JSON, is it not converting it back to binary?

2 Answers 2

2

MongoDB in fact uses BSON data to communicate, so its already binary. You can read it explicitly in the application without "proper" parsing JSON. However, even if you would do, I believe the "penalty" is small enough to be neglected.

Remember - Premature optimization is the root of all evil - and you are indeed doing this now.

Sign up to request clarification or add additional context in comments.

3 Comments

Premature optimization is the root of all evil - yes I agree, but we're assuming that based upon the fact our incoming stream is about 3GB per second, therefore this seems like an OK assumption to optimise :) However, the link to BSON is super! +1
@MyDaftQuestions Is the last thing I would try to optimise really. In my recent test for mongo data (de)serialization accounted for less then 0.1% processing time. You can gain much more from proper sharding, indices and application architecture (reactive stack maybe). To answer directly - most Mongo drivers already uses binary protocol, so there is not much to gain anyway.
Thanks for taking the time to chat to me!
0

The cost of serialization/deserialization is small compared to the gains in interoperability and querying.

For example, if you were to use Azure DocumentDB (https://azure.microsoft.com/en-us/services/documentdb/), converting your binary properties into its representative JSON type (int, string) will give you the ability to query it better (range queries, spatial queries etc.) - https://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/

(full disclosure I work at Microsoft on DocumentDB - happy to chat 1-on-1 any time!)

1 Comment

We're logging over 3GB per second of binary data. What is the purpose of serialising (which serialises to string if JSON) where most of the information is integer value. :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.