I am working on a parser that is currently way too slow for my needs (like 40x slower than I would like) and would like advice on methods to increase my speed. I have tried and am currently using a custom regex parser, aswell as a custom parser using strscanner class. Ive heard a lot of positive comments on treetop, and have considered trying to combine the regex into one huge regex that would cover all matches, but would like to get some feedback w/ experience before I rewrite my parser yet again.
The basic rules of the strings that I am parsing are:
- 3 segments (BoL operators, message, EoL operators)
- ~6 BoL operators BoL operators can be in any order
- 2 EoL operators EoL operators can be in any order
- Quantity of any specific operator can be 0, 1, or >1 but only 1 is used rest are removed and discarded
- Operators in the 'message' section of the string are not captured / removed
- Whitespaces is allowed before & after operators but not required
- Some BoL operators can have whitespace in the setting
My current Regex parser works by running the string through a loop that checks for BoL or EoL operators 1 at a time and cutting them out, ending the loop when there are no more operators of the given type as so...
loop{
if input =~ /^\s+/ then input.gsub!(/^\s+/,'') end
if input =~ /reges for operator_a/ #sets
sets operator_a
input.gsub!(/regex for operator_a)/, '')
elsif input =~ /regex for operator_b/
sets operator_b
input.gsub!(/regex for operator_b/,'')
elsif input =~ /regex for operator_c/
sets operator_c
etc .. etc .. etc..
else
break
end
}
The question I have, What would be the best way to optimize this code? Treetop, another library/gem that I have not found yet, combining the loops into one huge regex, something else?
Please restrict all answers and input to the Ruby language, I know that it is not the 'best' tool for this job, it is the language that I use.
More specific grammer / examples if that helps. This is for parsing communication commands sent to a game by users, so far the only commands are say, and whisper. The begenning of line operators accepted are ::{target}, :{adverb}, ={verb}, and #{direction of}. The end of line operators are {emoticon (aka. :D :( :)}, which sets adverb if not already set, and end of line puncutation which sets verb if not already set. the character ' is an alias for say, and sayto is an alias for say:: examples :
':happy::my sword=as# my helm Bol command operators work.
{:action=>:say, :adverb=>"happily", :verb=>"ask", :direction=>"my helm", :message=>"Bol command operators work."}
say yep say works
{:action=>:say, :message=>" yep say works"}
sayto my sword yep sayto works as do EoL operators!:)
{:action=>:say, :target=>"my sword", :adverb=>"happily", :verb=>"say", :message=>"yep sayto works as do EoL operators!"}
whisper::my friend : happy Bol command operators work with whisper.
{:action=>:whisper, :target=>"my friend", :adverb=>"happily", :message=>"Bol command operators work with whisper."}
whisp:happy::tinkerbell and they work in a different order.
{:action=>:whisper, :adverb=>"happily", :target=>"tinkerbell", :message=>"and they work in a different order."}
':bash=exclaim::hammer BoL operators work in this order too.
{:action=>:say, :adverb=>"bashfully", :verb=>"exclaim", :target=>"hammer", :message=>"BoL operators work in this order too."}
sayto bells =say :sad #wontwork Bol > Eol and directed !work with directional? :)
{:action=>:say, :verb=>"say", :adverb=>"sadly", :direction=>"wontwork", :message=>"Bol > Eol and directed !work with directional?"}
'all EoL removed closest to end used and reinserted. !!??!?....... :) ? :(
{:action=>:say, :adverb=>"sadly", :verb=>"ask", :message=>"all EoL removed closest to end used and reinserted?"}
op_a_re = /regex for operator_a/; loop { ... input ~= op_a_re ... }; this way you call implicitRegexp.newonce, instead of once per loop iteration. Although, my admittedly very very simple benchmark only sped up by 5% on 1.9.2..