6

I have such old.JSON file:

[{
    "id": "333333",
    "creation_timestamp": 0,
    "type": "MEDICAL",
    "owner": "MED.com",
    "datafiles": ["stomach.data", "heart.data"]
}]

Then I create an object based on .proto file:

message Dataset {
  string id = 1;
  uint64 creation_timestamp = 2;
  string type = 3;
  string owner = 4;
  repeated string datafiles = 6;
}

Now I want to save this object save back this object to other .JSON file. I did this:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    json.dump(MessageToJson(item), jsfile)

As a result I have:

"{\n  \"id\": \"333333\",\n  \"type\": \"MEDICAL\",\n  \"owner\": \"MED.com\",\n  \"datafiles\": [\n    \"stomach.data\",\n    \"heart.data\"\n  ]\n}"

How to make this file looks like old.JSON file?

6
  • In what way was this not like the original? I notice that its not in a list. Is that the problem? Commented May 7, 2017 at 17:56
  • @tdelaney Yes, it a not a list. It has \" instead of just ", and \n is explicit. Commented May 7, 2017 at 17:59
  • Have you tried jsfile.write(MessageToJson(item)) directly? Commented May 7, 2017 at 18:03
  • The list is likely how you save the data in the first place. You defined a message type for a single dict inside the list. From what you've posted here I don't know if you have defined another message type for the enclosing list. But if you just encoded each item of that outer list, you lost the list. As for \n, try printing the string... they get rendered as newlines. The python representation of a string shows them as \n so you can see them. Commented May 7, 2017 at 18:04
  • @Psidom it works, but save as not list, but I can add [] to file manually. Commented May 7, 2017 at 18:06

1 Answer 1

7

The weird escaping comes from converting the text to json twice, thus forcing the second call to escape the json characters from the first call. Detailed explanation follows:

https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-pysrc

31  """Contains routines for printing protocol messages in JSON format. 
32   
33  Simple usage example: 
34   
35    # Create a proto object and serialize it to a json format string. 
36    message = my_proto_pb2.MyMessage(foo='bar') 
37    json_string = json_format.MessageToJson(message) 
38   
39    # Parse a json format string to proto object. 
40    message = json_format.Parse(json_string, my_proto_pb2.MyMessage()) 
41  """ 

also

 89 -def MessageToJson(message, including_default_value_fields=False): 
...
 99    Returns: 
100      A string containing the JSON formatted protocol buffer message. 

It is pretty clear that this function will return exactly one object of type string. This string contains a lot of json structure, but it's still just a string, as far as python is concerned.

You then pass it to a function which takes a python object (not json), and serializes it to json.

https://docs.python.org/3/library/json.html

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

Okay, how exactly would you encode a string into json? Clearly it can't just use json specific characters, so those would have to be escaped. Maybe there's an online tool, like http://bernhardhaeussner.de/odd/json-escape/ or http://www.freeformatter.com/json-escape.html

You can go there, post the starting json from the top of your question, tell it to generate the proper json, and you get back ... almost exactly what you are getting at the bottom of your question. Cool everything worked correctly!

(I say almost because one of those links adds some newlines on its own, for no apparent reason. If you encode it with the first link, then decode it with the second, it is exact.)

But that's not the answer you wanted, because you didn't want to double-jsonify the data structure. You just wanted to serialize it to json once, and write that to a file:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    actual_json_text = MessageToJson(item)
    jsfile.write( actual_json_text )

Addendum: MessageToJson might need additional parameters to behave as expected
including_default_value_fields=True
preserving_proto_field_name=True
(see comments and links below)

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, MessageToJson looks good, but causes new problem stackoverflow.com/questions/43835243/…
The key part of the solution is just to change json.dump to jsfile.write. As the answer points out we don't want to double jsonify the message
It's 2021, an new problem arises: MessageToJson sometimes truncates message fields. See stackoverflow.com/q/69364763/987846

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.