10

My server is going to be sending a JSON, serialized as a string, through a socket to another client machine. I'll take my final json and do this:

import json
python_dict_obj = { "id" : 1001, "name" : "something", "file" : <???> }
serialized_json_str = json.dumps(python_dict_obj)

I'd like to have one of the fields in my JSON have the value that is a file, encoded as a string.

Performance-wise (but also interoperability-wise) what is the best way to encode a file using python? Base64? Binary? Just the raw string text?

EDIT - For those suggestion base64, something like this?

# get file
import base64
import json

with open(filename, 'r') as f:
    filecontents = f.read()
encoded = base64.b64encode(filecontents)
python_dict_obj['file'] = encoded
serialized_json_str = json.dumps(python_dict_obj)

# ... sent to client via socket

# decrpyting
json_again = json.loads(serialized)
filecontents_again = base64.b64decode(json_again['file'])
1
  • In python 3.5, I needed to do one more encode to get a string in my dict. python_dict_obj['file'] = encoded.encode(). Otherwise, the value was a binary b'something' which caused an error during json.dumps. Commented Oct 5, 2018 at 17:29

2 Answers 2

7

I'd use base64. JSON isn't designed to communicate binary data. So unless your file's content is vanilla text, it "should be" encoded to use vanilla text. Virtually everything can encode and decode base64. If you instead use (for example) Python's repr(file_content), that also produces "plain text", but the receiving end would need to know how to decode the string escapes Python's repr() uses.

Sign up to request clarification or add additional context in comments.

2 Comments

Well, on the decoding side, you'd get filecontents from some spelling of json.loads(). After that, you're done with JSON. The base64 decoder applied to filecontents will give you back the original binary file contents. Try it! This is easier to do than to explain ;-)
Oops! After your latest edit, I think you figured it out :-)
3

JSON cannot handle binary. You will need to encode the data as text before serializing, and the easiest to encode it as is Base64. You do not need to use the URL-safe form of encoding unless there are requirements for it further down the processing chain.

5 Comments

Is it also worth base64 encoding the entire json string as well too?
Only if something further down the chain requires it. But JSON is plain text regardless (except regarding ensure_ascii, but that's a different issue which you'll either already know how to handle or can safely ignore).
so I wouldn't gain any size reduction by doing that? (base64 encoding file then base64 encoding surrounding json as well)
No size reduction: base64 encoding generally increases the number of bytes needed. After all, it's only using 6 of each 8 bits per byte (2**6 == 64, the number of distinct possible values in a base64 encoding).
Encoding as Base64 increases the size of the data by 33%.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.