0

I am working on a parser that is currently way too slow for my needs (like 40x slower than I would like) and would like advice on methods to increase my speed. I have tried and am currently using a custom regex parser, aswell as a custom parser using strscanner class. Ive heard a lot of positive comments on treetop, and have considered trying to combine the regex into one huge regex that would cover all matches, but would like to get some feedback w/ experience before I rewrite my parser yet again.

The basic rules of the strings that I am parsing are:

  • 3 segments (BoL operators, message, EoL operators)
  • ~6 BoL operators BoL operators can be in any order
  • 2 EoL operators EoL operators can be in any order
  • Quantity of any specific operator can be 0, 1, or >1 but only 1 is used rest are removed and discarded
  • Operators in the 'message' section of the string are not captured / removed
  • Whitespaces is allowed before & after operators but not required
  • Some BoL operators can have whitespace in the setting

My current Regex parser works by running the string through a loop that checks for BoL or EoL operators 1 at a time and cutting them out, ending the loop when there are no more operators of the given type as so...

loop{ 
if input =~ /^\s+/ then input.gsub!(/^\s+/,'') end
if input =~ /reges for operator_a/ #sets 
   sets operator_a
   input.gsub!(/regex for operator_a)/, '')
elsif input =~ /regex for operator_b/ 
   sets operator_b
   input.gsub!(/regex for operator_b/,'')
elsif input =~ /regex for operator_c/
   sets operator_c
   etc .. etc .. etc..
else
break
end
}

The question I have, What would be the best way to optimize this code? Treetop, another library/gem that I have not found yet, combining the loops into one huge regex, something else?

Please restrict all answers and input to the Ruby language, I know that it is not the 'best' tool for this job, it is the language that I use.

More specific grammer / examples if that helps. This is for parsing communication commands sent to a game by users, so far the only commands are say, and whisper. The begenning of line operators accepted are ::{target}, :{adverb}, ={verb}, and #{direction of}. The end of line operators are {emoticon (aka. :D :( :)}, which sets adverb if not already set, and end of line puncutation which sets verb if not already set. the character ' is an alias for say, and sayto is an alias for say:: examples :

':happy::my sword=as# my helm Bol command operators work.

{:action=>:say, :adverb=>"happily", :verb=>"ask", :direction=>"my helm", :message=>"Bol command operators work."}

say yep say works

{:action=>:say, :message=>" yep say works"}

sayto my sword yep sayto works as do EoL operators!:)

{:action=>:say, :target=>"my sword", :adverb=>"happily", :verb=>"say", :message=>"yep sayto works as do EoL operators!"}

whisper::my friend : happy Bol command operators work with whisper.

{:action=>:whisper, :target=>"my friend", :adverb=>"happily", :message=>"Bol command operators work with whisper."}

whisp:happy::tinkerbell and they work in a different order.

{:action=>:whisper, :adverb=>"happily", :target=>"tinkerbell", :message=>"and they work in a different order."}

':bash=exclaim::hammer BoL operators work in this order too.

{:action=>:say, :adverb=>"bashfully", :verb=>"exclaim", :target=>"hammer", :message=>"BoL operators work in this order too."}

sayto bells =say :sad #wontwork Bol > Eol and directed !work with directional? :)

{:action=>:say, :verb=>"say", :adverb=>"sadly", :direction=>"wontwork", :message=>"Bol > Eol and directed !work with directional?"}

'all EoL removed closest to end used and reinserted. !!??!?....... :) ? :(

{:action=>:say, :adverb=>"sadly", :verb=>"ask", :message=>"all EoL removed closest to end used and reinserted?"}
6
  • 1
    A small improvement might be had by extracting the regexps into variables before the loop: op_a_re = /regex for operator_a/; loop { ... input ~= op_a_re ... }; this way you call implicit Regexp.new once, instead of once per loop iteration. Although, my admittedly very very simple benchmark only sped up by 5% on 1.9.2.. Commented Dec 22, 2011 at 18:13
  • @Amadan Very good point! I had thought of doing that when I got to the code 'clean-up' phase, but hadn't thought about it affecting speed aswell. TY Commented Dec 22, 2011 at 18:16
  • Without knowing much about what you're trying to parse, the question is difficult to answer. It sounds more like a real grammar would be beneficial, but that doesn't necessarily mean it would be faster. Without examples, it's tricky to theorize. Commented Dec 22, 2011 at 21:27
  • 1
    @JosephRuby Yep, and the grammar is still unclear. Grammars are most easily understood when documented generically. No suggestions at this point, although it looks like some splitting and checking for command-ness might be enough, or switch to something like treetop. Can't tell if the grammar is regular-enough to benefit from TT though. Commented Dec 22, 2011 at 21:39
  • 1
    @JosephRuby Still looks like a more-naive split would work, but w/o spending more time on it, not sure. It might even be doable w/ an internal DSL if you make a few concessions, or possibly a thin layer over an internal DSL making a real grammar much easier. Commented Dec 22, 2011 at 22:03

1 Answer 1

3

Maybe this syntax is useful in your case:

emoti_convert = { ":)" => "happily", ":(" => "sadly" }
re_emoti = Regexp.union(emoti_convert.keys)
str = "It does not work :(. Oh, it does :)!"

p str.gsub(re_emoti, emoti_convert)
#=> "It does not work sadly. Oh, it does happily!"

But if you are trying to define a grammar, this is not the way to go (agreeing with @Dave Newton's comments).

Sign up to request clarification or add additional context in comments.

1 Comment

I honestly didn't know that a regex'ed hash could be used in a gsub like that very nice trick! Don't think it will help in this case though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.