1

How can I parse a JSON string that has embedded Javascript constants or variables?

For example, how to parse a JSON string like this one?

  {
    "menu": {
      "id": "file",
      "value": "File",
      "popup": {
        "menuitem": [
          {
            "value": "New",
            "onclick": Handlers.NEW
          },
          {
            "value": "Open",
            "onclick": Handlers.OPEN
          },
          {
            "value": "Custom",
            "onclick": "function(){doSomething(Handlers.OPEN);}"

          }
        ]
      }
    }
  }

All validators of course consider the JSON to be invalid, yet it is perfectly valid when evaluated in a context where the corresponding Javascript objects are defined.

The first thing that comes to mind is to pre-process the string before feeding it to the JSON parser, but that is tricky, since the same strings can occur inside existing strings (as shown in the sample JSON), and it would require some regex fiddling in order to reliably detect whether e.g. Handlers.NEW is used as an undecorated value, or inside an existing string value.

Is there a clean way to handle this use case without having to do manual regex replacements?

1
  • 1
    It is in fact not valid JSON (although it is a valid Javascript object). You might need to write your own parser for this. Commented Feb 23, 2016 at 17:47

2 Answers 2

2

You can use the ast-module:

import ast

data = """{
    "menu": {
      "id": "file",
      "value": "File",
      "popup": {
        "menuitem": [
          {
            "value": "New",
            "onclick": Handlers.NEW
          },
          {
            "value": "Open",
            "onclick": Handlers.OPEN
          },
          {
            "value": "Custom",
            "onclick": "function(){doSomething(Handlers.OPEN);}"

          }
        ]
      }
    }
  }"""

def transform(item):
    if isinstance(item, ast.Dict):
        return dict(zip(map(transform,item.keys), map(transform, item.values)))
    elif isinstance(item, ast.List):
        return map(transform, item.elts)
    elif isinstance(item, ast.Str):
        return item.s
    else:
        return item

print transform(ast.parse(data).body[0].value)
Sign up to request clarification or add additional context in comments.

Comments

1
import ast

s="""{
"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {
        "value": "New",
        "onclick": Handlers.NEW
      },
      {
        "value": "Open",
        "onclick": Handlers.OPEN
      },
      {
        "value": "Custom",
        "onclick": "function(){doSomething(Handlers.OPEN);}"

      }
    ]
  }
}
}"""

def evaluate(obj):
    if isinstance(obj, ast.Module):
        return evaluate(obj.body[0])
    elif isinstance(obj, ast.Expr):
        return evaluate(obj.value)
    elif isinstance(obj, ast.Dict):
        return {key.s: parse(value) for key, value in zip(obj.keys, obj.values)}
    elif isinstance(obj, ast.List):
        return [parse(element) for element in obj.elts]
    elif isinstance(obj, ast.Str):
        return obj.s
    elif isinstance(obj, ast.Attribute):
        return evaluate(obj.value) + "." + obj.attr
    elif isinstance(obj, ast.Name):
        return obj.id
    elif isinstance(obj, ast.Num):
        return obj.n
    else:
        print("I apparently forgot", type(obj))

x = evaluate(ast.parse(s))
print(x)

This parses the string into an Abstract Syntax Tree, and then recursively builds a Python object out of it, converting attributes into strings.

2 Comments

@ccpizza Thanks! If you find a case where it breaks, let me know. The only cases that would be unfixable would be an object that isn't syntactically valid Python.
works like a charm with a pretty complex JSON that is almost 1MB in size. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.