1

I have a file that contains multiple JSON objects that are not separated by comma :

{
  "field" : "value",
  "another_field": "another_value"
} // no comma
{ 
  "field" : "value"
}

Each of the objects standalone is a valid json object.

Is there a way that I can process this file easily?

  1. I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
  2. I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).

In .NET there's a library that has this exact feature : https://stackoverflow.com/a/29480032/2970729 https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm

Is there anything equivalent in Ruby?

4
  • 1
    "I can't open a text editor and smart-insert commas / square brackets before the run" -- You don't need to open a text editor to edit a file! Why not just run sed on the file? Commented Oct 17, 2017 at 15:48
  • Are the nested hashes in the files or can assume that adding a , after each } would we a huge step forwards? Commented Oct 17, 2017 at 15:54
  • @TomLord Didn't know this command, looking into it right now. Actually I have the option to run bash commands before the processing so it might give a proper solution to my problem, but still wondering whether there's a ruby solution to this Commented Oct 17, 2017 at 15:56
  • @spickermann my objects are not complexed and doesn't contain nested objects (currently...) Commented Oct 17, 2017 at 15:56

3 Answers 3

1

As long as your file is that simple you might want to do something like this:

# content = File.read(filename)
content =<<-EOF
  {
    "field" : "value",
    "another_field": "another_value"
  } // no comma
  { 
    "field" : "value"
  }
EOF

require 'json'

JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]
Sign up to request clarification or add additional context in comments.

1 Comment

Works as expected :) Thanks a lot!
0

The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.

require 'yajl'

File.open 'file.json' do |f|
  Yajl.load f do |object|
    # do something with object
  end
end

See the documentation for other options (buffer size, symbolized keys, etc).

Comments

0

If you know the data will be valid JSON documents, you can use this method to split the string up into documents, and then parse each document.

def split_documents(str)
  res = []
  depth = 0
  start = 0
  str.scan(/([{}]|"(?:\\"|[^"])*")/) do |match|
    if match[0] == '{'
      depth += 1
    elsif match[0] == '}'
      depth -= 1
      if depth == 0
        match_start = Regexp.last_match.begin(0)
        res << str[start..match_start]
        start = match_start + 1
      end
    end
  end
  res
end

This scans the string for {, }, or strings. Each time it hits a {, it increases the depth by 1. Each time it hits a }, is decreases the depth by 1. Every time the depth hits zero, you know you have reached the end of a document because you have balanced braces. The regex has to also match strings so that it doesn't accidentally count braces inside of strings e.g. { "foo": "ba}r" }.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.