3

I am running into an issue where the JSON produced by a Ruby script is not compatible when parsed by JavaScripts JSON.parse. Consider the following example:

# Ruby
require 'json'
hash = {}
hash["key"] = "value with \u001a unicode"
hash.to_json
=> '{"key":"value with \u001a unicode"}'

// JavaScript
JSON.parse('{"key":"value with \u001a unicode"}')
=> JSON.parse: bad control character in string literal at line 1 column 2 of the JSON data

The issue is the unicode character \u001a. The solution to this is to escape \u001a to \\u001a, but the thing is, the \u001a is automatically inserted into the string by Ruby. I can't reliably post-process the result. Any ideas about how to solve this?

Please note that I wish to call JSON.parse inside a JavaScript execution environment, not inside Ruby's interpreter.

4
  • I ran your code and I'm actually getting this as output: => "{\"key\":\"value with \\u001a unicode\"}" Commented Apr 24, 2015 at 21:21
  • I ran your code also and it worked fine. Commented Apr 24, 2015 at 21:21
  • 1
    You are looking at the output in the terminal. \\u001a is the terminal is the physical string \u001a. Ruby displays the backslash as \\ so you can tell the difference between the single character \u001a and the six character string also written \u001a. Commented Apr 24, 2015 at 21:29
  • Also note that JSON.parse should be called inside a JavaScript execution environment, not inside the Ruby interpreter. Commented Apr 24, 2015 at 21:31

3 Answers 3

4

The short version is that you're interpreting your string as a Javascript expression before attempting to decode it as JSON.

U+001A is a control character. RFC 4627 explicitly disallows control characters U+0000-U+001F in quoted strings. Your problem here is not the the JSON is invalid, but that you are unescaping your control characters before attempting to parse them as JSON.

When you dump the string "\u001a" from Ruby and copy and paste it into a Javascript interpreter, the escape sequence translates to an unescaped control character, which is not a valid character in JSON! Non-prohibited characters work just fine - you can happily JSON.parse('["\u0020"]'), for example.

However, if you don't interpret the string as Javascript, and instead read it as raw bytes, it will parse correctly.

$ irb
irb(main):001:0> require 'json'
=> true
irb(main):003:0> open("out.json", "w") {|f| f.print JSON.dump(["\u001a"]) }
=> nil

$ node -e 'require("fs").readFile("out.json", function(err, data) { console.log(JSON.parse(data)); });'
[ '\u001a' ]

If you're going to be copy-pasting, you need to be copying an escaped version of the string, so that when the string is parsed by your Javascript engine, the escape double-escaped sequences properly unescape to escape sequences rather than characters. So, rather than copying the output of JSON.dump(["\u001a"]), you should be copying the output of puts JSON.dump(["\u001a"]).inspect, which will correctly escape any escape sequences in the string.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way to write out the properly escaped version of the string? I'm writing the string out to a file and then someone else is reading the file in and copying the string into a JavaScript file (programmatically).
If you're writing it out with Javascript, JSON.stringify(json_string). If you're writing it with Ruby, JSON.dump(json_string).
0

To me following ruby code gives "{\"key\":\"value with \\u001a unicode\"}" in output.

And JSON.parse also abel to pass it. and gives Object {key: "value with unicode"}.

1 Comment

You're looking at the code in the terminal. It escapes the display String so you can see the characters. Otherwise, how could you tell the difference between \\u001a and \u001a. So \\u001a is the literal string \u001a without unicode escaping. To see the difference, compare the results of "\\u001a".size and "\u001a".size. Notice that the length of \\u001a is 6 not 7, meaning that Ruby is displaying the `` escaped.
0

According to the RFC:

JSON text is encoded in unicode. The default unicode is utf-8.

I ran your code in irb and got the following:

1.9.3-p484 :001 > require 'json'
 => true
1.9.3-p484 :002 >
1.9.3-p484 :003 >   hash = {}
 => {}
1.9.3-p484 :004 > hash["key"] = "value with \u001a unicode"
 => "value with \u001A unicode"
1.9.3-p484 :005 > hash.to_json
 => "{\"key\":\"value with \\u001a unicode\"}"

Then running the returned string in a javascript console, I get the following:

> JSON.parse("{\"key\":\"value with \\u001a unicode\"}")
> Object {key: "value with  unicode"}

It is returning an object. To get the value with unicode, you have to access the hash by calling:

> str = JSON.parse("{\"key\":\"value with \\u001a unicode\"}")
> Object {key: "value with  unicode"}
> str.key
> "value with  unicode"

4 Comments

JSON.parse should be executed in JavaScript execution environment, not inside Ruby's interpreter.
@Max actually, that works too. Just copy-pasted it into Chrome's console. Those are even different languages!.. Whatever.
@D-side take a look at the accepted answer if you'd like an explanation of why Ruby's console output works. The console output is not the exact string returned by the to_json call.
@Max I actually know. I've hit a similar issue before when escaping shell commands in Ruby. The general rule here is: know when and how many times your input will be unescaped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.