6

I am caching some JSON data, and in storage it is represented as a JSON-encode string. No work is performed on the JSON by the server before sending it to the client, other than collation of multiple cached objects, like this:

def get_cached_items():
  item1 = cache.get(1)
  item2 = cache.get(2)
  return json.dumps(item1=item1, item2=item2, msg="123")

There may be other items included with the return value, in this case represented by msg="123".

The issue is that the cached items are double-escaped. It would behoove the library to allow a pass-through of the string without escaping it.

I have looked at the documentation for json.dumps default argument, as it seems to be the place where one would address this, and searched on google/SO but found no useful results.

It would be unfortunate, from a performance perspective, if I had to decode the JSON of each cached items to send it to the browser. It would be unfortunate from a complexity perspective to not be able to use json.dumps.

My inclination is to write a class that stores the cached string and when the default handler encounters an instance of this class it uses the string without perform escaping. I have yet to figure out how to achieve this though, and I would be grateful for thoughts and assistance.

EDIT For clarity, here is an example of the proposed default technique:

class RawJSON(object):
   def __init__(self, str):
       self.str = str

class JSONEncoderWithRaw(json.JSONEncoder):
   def default(self, o):
       if isinstance(o, RawJSON): 
          return o.str # but avoid call to `encode_basestring` (or ASCII equiv.)
       return super(JSONEncoderWithRaw, self).default(o)

Here is a degenerate example of the above:

>>> class M():
       str = ''
>>> m = M()
>>> m.str = json.dumps(dict(x=123))
>>> json.dumps(dict(a=m), default=lambda (o): o.str)
'{"a": "{\\"x\\": 123}"}'

The desired output would include the unescaped string m.str, being:

'{"a": {"x": 123}}'

It would be good if the json module did not encode/escape the return of the default parameter, or if same could be avoided. In the absence of a method via the default parameter, one may have to achieve the objective here by overloading the encode and iterencode method of JSONEncoder, which brings challenges in terms of complexity, interoperability, and performance.

3 Answers 3

6

A quick-n-dirty way is to patch json.encoder.encode_basestring*() functions:

import json

class RawJson(unicode):
    pass

# patch json.encoder module
for name in ['encode_basestring', 'encode_basestring_ascii']:
    def encode(o, _encode=getattr(json.encoder, name)):
        return o if isinstance(o, RawJson) else _encode(o)
    setattr(json.encoder, name, encode)


print(json.dumps([1, RawJson(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]
Sign up to request clarification or add additional context in comments.

Comments

4

If you are caching JSON strings, you need to first decode them to python structures; there is no way for json.dumps() to distinguish between normal strings and strings that are really JSON-encoded structures:

return json.dumps({'item1': json.loads(item1), 'item2': json.loads(item2), 'msg': "123"})

Unfortunately, there is no option to include already-converted JSON data in this; the default function is expected to return Python values. You extract data from whatever object that is passed in and return a value that can be converted to JSON, not a value that is already JSON itself.

The only other approach I can see is to insert "template" values, then use string replacement techniques to manipulate the JSON output to replace the templates with your actual cached data:

json_data = json.dumps({'item1': '==item1==', 'item2': '==item2==', 'msg': "123"})
return json_data.replace('"==item1=="', item1).replace('"==item2=="', item2)

A third option is to cache item1 and item2 in non-serialized form, as a Python structure instead of a JSON string.

7 Comments

Thanks Martijn. I take it then that any type of return from the default parameter function shall be escaped? The answer wasn't obvious from my implementation (homebrew's Python 2.7 on Mac OS X), but I'd believe you if you say so. ;)
@BrianM.Hunt: The default handler is only used for value types that json.dumps doesn't know how to handle. str is easy, the module knows exactly how to handle it, so the default parameter is not consulted for those.
And 'escaping' isn't the correct term here; json.dumps() takes Python structures as input and produces JSON output. Give it a Python string and out comes a JSON string, regardless of what is inside of the Python string. You should not conflate the two types, even though JSON is byte data and contained in a Python string.
Hi Martin, thanks. You'll see from the original question that I was contemplating passing a new type of class to json.dumps (not descended from basestring) and having the default handler return a string not escaped. The terminology in this context is "escape" or "encode" (see python2.7/json/encoder.py:encode_basestring). The problem seems to be either that something in c_make_encoder or encode_basestring in encoder.py escape the strings returned from default, which is the only place where we have access to the encoding mechanism (contra. simplejson's for_json?).
@BrianM.Hunt: Expanded; default is not the way to go, as it's return value is supposed to be more python structures, not JSON bytes.
|
2

You can use the better maintained simplejson instead of json which provides this functionality.

import simplejson as json
from simplejson.encoder import RawJSON

print(json.dumps([1, RawJSON(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]

You get simplicity of code, plus all the C optimisations of simplejson.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.