0

I have a simple use case where I wish to serialize and transmit vectors of integers between 0 and 256. I surmised that the most space-efficient way of doing so would be to serialize the vector as a serialized string, where the nth character has the ASCII code equivalent to the nth element of the corresponding vector. To this end, I wrote the following two functions:

std::string SerializeToBytes(const std::vector<int> &frag)
{
    std::vector<unsigned char> res;
    res.reserve(frag.size());
    for(int val : frag) {
        res.push_back((char) val);
    }
    return std::string(res.begin(), res.end());
}

std::vector<int> ParseFromBytes(const std::string &serialized_frag)
{
    std::vector<int> res;
    res.reserve(serialized_frag.length());
    for(unsigned char c : serialized_frag) {
        res.push_back(c);
    }
    return res;
}

However, when sending this data using JsonCpp, I run into issues. The minimum reproducible example below indicates that the issue does not stem from the above methods and instead appears only when a Json::Value is serialized and subsequently parsed. This causes the loss of some encoded data in the serialized string.

#include <cassert>
#include <json/json.h>

int main() {
    std::vector frag = { 230 };
    std::string serialized = SerializeToBytes(frag);

    // Will pass, indicating that the SerializeToBytes and ParseFromBytes functions are not the issue.
    assert(frag == ParseFromBytes(serialized));

    Json::Value val;
    val["STR"] = serialized;

    // Will pass, showing that the issue does not appear until JSON is serialized and then parsed.
    assert(frag == ParseFromBytes(val["STR"].asString()));

    Json::StreamWriterBuilder builder;
    builder["indentation"] = "";
    std::string serialized_json = Json::writeString(builder, val);

    // Will be serialized to "{\"STR\":\"\\ufffd\"}".
    Json::Value reconstructed_json;
    Json::Reader reader;
    reader.parse(serialized_json, reconstructed_json);

    // Will produce { 239, 191, 189 }, rather than { 230 }, as it should.
    std::vector<int> frag_from_json = ParseFromBytes(reconstructed_json["STR"].asString());

    // Will fail, showing that the issue stems from the serialize/parsing process.
    assert(frag == frag_from_json);

    return 0;
}

What is the cause of this issue, and how can I remedy it? Thanks for any help you can offer.

1
  • You may remove the intermediate vector res and push chars to a string variable in SerializeToBytes like you did it in ParseFromBytes. Commented Dec 28, 2021 at 22:37

1 Answer 1

2

Jsoncpp Class Value

This class is a discriminated union wrapper that can represents a:

  • ...
  • UTF-8 string
  • ...

{ 230 } is invalid UTF-8 string. Thus further expectations from Json::writeString(builder, val) for a correct result are illegal.

Sign up to request clarification or add additional context in comments.

2 Comments

I see. What would be the best way to serialize this vector of 0-256 values, then?
Base64 encoding I guess. If >127 are often bytes, Base64 is more optimal encoding than \230.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.