19

Let's say I had the string

"[1,2,[3,4,[5,6]],7]"

How would I parse that into the array

[1,2,[3,4,[5,6]],7]

?

Nesting structures and patterns are completely arbitrary in my usage case.

My current ad-hoc solution involves adding a space after every period and using YAML.load, but I'd like to have a cleaner one if possible.

(One that does not require external libraries if possible)

4 Answers 4

45

That particular example is being parsed correctly using JSON:

s = "[1,2,[3,4,[5,6]],7]"
#=> "[1,2,[3,4,[5,6]],7]"
require 'json'
#=> true
JSON.parse s
#=> [1, 2, [3, 4, [5, 6]], 7]

If that doesn't work, you can try running the string through eval, but you have to ensure that no actual ruby code has been passed, as eval could be used as injection vulnerability.

Edit: Here is a simple recursive, regex based parser, no validation, not tested, not for production use etc:

def my_scan s
  res = []
  s.scan(/((\d+)|(\[(.+)\]))/) do |match|
    if match[1]
      res << match[1].to_i
    elsif match[3]
      res << my_scan(match[3])
    end
  end
  res
end

s = "[1,2,[3,4,[5,6]],7]"
p my_scan(s).first #=> [1, 2, [3, 4, [5, 6]], 7]
Sign up to request clarification or add additional context in comments.

6 Comments

I'd like to use this, but I can't quite get json to run properly on my computer, and in any case it wouldn't be much cleaner than the yaml solution. Is there a way to manually code this parsing?
Not sure what do you mean by "not clean", as it is one method call to parse it. You could of course either write a simple regex-based parser of your own, or use dedicated tools, such as treetop.rubyforge.org but neither of those is simple as JSON.parse IMHO.
Oh, and JSON is part of Ruby core lib, at least in 1.9.x.
If you have a multi-type array like s = "['hello', 2, 'test', 5.0]", JSON will fail to parse with a generic error unexpected token at .... However, YAML does work as shown in @Arup's answer: YAML.load(s) => ["hello", 2, "test", 5.0].
@ChrisCirefice: That's because single quoted strings are not valid JSON.
|
18

The same can be done using Ruby standard libaray YAML as below :

require 'yaml'
s = "[1,2,[3,4,[5,6]],7]"
YAML.load(s)
# => [1, 2, [3, 4, [5, 6]], 7]

2 Comments

+1; YAML successfully loads multi-type arrays, e.g. "['hello', 2, 'test', 5.0]", where JSON fails to parse.
This has the advantage it does not throw an error if there is a nil element, but it outputs the nil as 'nil', so still need to convert that to a nil.
6

"Obviously" the best solution is to write your own parser. [ If you like writing parsers, have never done it before and want to learn something new, or want control over the exact grammar ]

require 'parslet'

class Parser < Parslet::Parser
  rule(:space)       { str(' ') }
  rule(:space?)      { space.repeat(0) }
  rule(:openbrace_)  { str('[').as(:op) >> space? }
  rule(:closebrace_) { str(']').as(:cl) >> space? }
  rule(:comma_)      { str(',') >> space?  }
  rule(:integer)     { match['0-9'].repeat(1).as(:int) }
  rule(:value)       { (array | integer) >> space? }
  rule(:list)        { value >> ( comma_ >> value ).repeat(0) }
  rule(:array)       { (openbrace_ >> list.maybe.as(:list) >> closebrace_ )}
  rule(:nest)        { space? >> array.maybe }
  root(:nest)
end

class Arr
  def initialize(args)
    @val = args
  end
  def val
    @val.map{|v| v.is_a?(Arr) ? v.val : v}
  end
end


class MyTransform < Parslet::Transform
  rule(:int => simple(:x))      { Integer(x) }
  rule(:op => '[', :cl => ']')  { Arr.new([]) }
  rule(:op => '[', :list => simple(:x), :cl => ']')   {  Arr.new([x]) }
  rule(:op => '[', :list => sequence(:x), :cl => ']')   { Arr.new(x) }
end

def parse(s)
  MyTransform.new.apply(Parser.new.parse(s)).val
end

parse " [   1  ,   2  ,  [  3  ,  4  ,  [  5   ,  6  , [ ]]   ]  ,  7  ]  "

Parslet transforms will match a single value as "simple" but if that value returns an array, you soon get arrays of arrays, then you have to start using subtree. returning objects however are fine as they represent a single value when transforming the layer above... so sequence will match fine.

Couple the trouble with returning bare arrays, with the problem that Array([x]) and Array(x) give you the same thing... and you get very confusing results.

To avoid this I made a helper class called Arr which represents an array of items. I could then dictate what I pass into it. Then I can get the parser to keep all the brackets even if you have the example that @MateuszFryc called out :) (thanks @MateuszFryc)

6 Comments

I would say that it isn't necessarily the obviously best solution, as it depends on the input and its format and how it is generated. However, it is one of the most flexible. Also, a full working parslet example is a rare treat so +1 to you!
how about [[[1],[2,3]]] ?
@MateuszFryc - puts (parse "[[[1],[2,3]]]").inspect => [[1], [2, 3]] - seems to work to me :)... oh I see ...we lost the outer brackets?
@NigelThorne - thanks for the update, but it seems that still there is some discrepancy in your parser/transformer, now. Take a look at e.g "[]" array, it produces [nil]. Looks like rule rule(:op => '[', :cl => ']') { Arr.new([]) } is never matched? as you thought it would be. This is rather matched by rule(:op => '[', :list => simple(:x), :cl => ']') { Arr.new([x]) }, thus the problem.
You could simply add compact to mentioned rule: rule(:op => '[', :list => simple(:x), :cl => ']') { Arr.new([x].compact) } and one which you think is used rule(:op => '[', :cl => ']') { Arr.new([]) }, delete completely.
|
3

Use eval

array = eval("[1,2,[3,4,[5,6]],7]")

5 Comments

This isn't a part my application that I feel safe to leave vulnerable to injections, sorry.
@Justin L., A "clean room" + "sandbox" will protect you from the evils of eval: stackoverflow.com/questions/2045324/… . About all that is left to protect against is code that runs a long time; Timeout can take care of that.
Please add a note of the security risks to your answer so you don't get downvotes. Some suggestion as to how to mitigate the risks would also be valuable.
I agree with necessity to add warning about security, therefore I downvoted the otherwise fine solution.
bad practice :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.