10

In order to try to implement a PEG in JavaScript that doesn't make older browsers crash from stack overflows, I would like to make a parsing expression grammar that parses a string in a non-recursive way. How do you do this? It feels mind bending.

Say you have a structure like this:

  • A grammar has many expressions
  • An expression has many matchers
  • A matcher has many tokens (or whatever is a better word for it)
  • A token can either point to another expression, or be a primitive string or regular expression. So if it points to another expression, this is where the recursion starts.

So say you define the hierarchy like this:

var grammar = new Grammar('math');
var expression = grammar.expression;

expression('math')
  .match(':number', ':operator', ':number', function(left, operator, right){
    switch (operator) {
      case '+': return left + right;
      case '-': return left - right;
      case '*': return left * right;
      case '/': return left / right;
    }
  });

expression('number')
  .match(/\d+/, parseInt);

expression('operator')
  .match('+')
  .match('-')
  .match('*')
  .match('/');

var val = grammar.parse('6*8'); // 42

When you call grammar.parse, it starts at the root expression (which has the same name as it, so "math"). Then it iterates through each matcher, then each token, and if the token is an expression, recurses. Basically this (the parser will keep track of the offset/position of the string it's matching the patterns against; this is just pseudocode):

function parse(str, expression, position) {
  var result = [];

  expression.matchers.forEach(function(matcher){
    matcher.tokens.forEach(function(token){
      var val;
      if (token.expression) {
        val = parse(str, token.expression, position);
      } else {
        val = token.parse(str, position);
      }
      if (val) result.push(val);
    });
  });

  return result;
}

parse('6*8', grammar.root, 0);

So for a simple expression like 6*8 there's very little recursion, but you could quickly get to a complex expression with many levels of nesting. Plus multiply the nesting by all those nested for loops, and the stack becomes large (I don't actually use forEach, I use for loops, but in the for loop it calls a function most of the time, so it ends up being pretty much the same).

The question is, how do you "flatten this out"? Instead of doing recursion, how do you make this so it's essentially like this:

while (token = stack.pop()) {
  val = token.parse(val);
  if (val) result.push(val);
}

I'm not looking for the details of how to implement a solution to this specific PEG problem, I am more just looking for the general way you keep track of the recursive state in a non-recursive way.

7
  • It seems like you have to keep track of the depth of each type of object separately, in some sort of pass-through object (like new Context(grammar)). So it would have properties expressionStack, matcherStack, and tokenStack, and keep track of the indices for each. But then that seems complicated, so not sure if that is the way... Commented Apr 20, 2014 at 1:05
  • The most general solution is a source-to-source compiler from Javascript to Javascript that converts the input program to continuation-passing style with a trampoline. I have no idea whether anyone's written such a beast. Commented Apr 20, 2014 at 2:57
  • That seems complex too :). This is what I have so far, but (a) it's starting to seem more complex to the point where it's hard to reason about, and (b) it's making it harder to handle edge cases. github.com/parsejs/grammar/blob/master/lib/context.js Commented Apr 20, 2014 at 3:08
  • vs. an older version with recursion: github.com/parsejs/grammar/blob/… Commented Apr 20, 2014 at 3:10
  • 1
    @LancePollard The more you flatten it the more compelex it becomes .. Recursion can be done using Stack\Queue + Loop, build a Language and a Parser, build a complex Finite State Machine, build a more complex Turing machine Commented Apr 20, 2014 at 4:35

2 Answers 2

1

In general, what you do is you write a stack in code and you put your “local” variables in a “stack frame” context object that you keep on that stack. Then, where you would have a “recursive call”, you store the current stack frame and create a new one for the new current context. Doing the “return” is just a matter of reversing the operation. It's not especially complicated, but it does make the code a bit of a mess. The only thing to watch out for is that you get to the bottom of the stack at the point when you've finished parsing the expression (so that trailing tokens and missing tokens don't cause problems).

It's very much like what happens with a stack maintained in machine code, except you're not restricted to primitive values and can make things quite a lot neater (at the data structure level) as a consequence.

If you've got the time, consider writing (or using someone else's) LR(1) parser. Those maintain very little system stack and handle a number of evil cases in grammars better than your home-rolled LL(k) grammar. However, they're considerably more mysterious in how they work than what you've got now.

Sign up to request clarification or add additional context in comments.

Comments

1

I am more just looking for the general way you keep track of the recursive state in a non-recursive way.

Use pushing and popping in stacks (arrays).
And it would be easier if you had goto's.
A (factorial) approach in VBA (more clear because of goto's).

Option Explicit
Sub Main()
  MsgBox fac(1)
  MsgBox fac(5)
End Sub
Function fac(n&)
  Dim answer&, level&, stackn&(99)
  level = 0
zentry:
  If n = 1 Then answer = 1: GoTo zreturn
  level = level + 1 ' push n
  stackn(level) = n
  n = n - 1 ' call fac(n-1)
  GoTo zentry
zreturn:
  If level = 0 Then fac = answer: Exit Function
  n = stackn(level) ' pop n
  level = level - 1
  answer = n * answer ' factorial
  GoTo zreturn
End Function

The same approach in javascript.

console.log(fac(1));
console.log(fac(5));
function fac(n) { // non-recursive
  var answer, level; 
  var stackn = []; 
  level = 0;
  while (true) { // no goto's
    if (n == 1) { answer = 1; break; }
    level = level + 1; // push n
    stackn[level] = n;
    n = n - 1; } // call fac(n-1) 
  while (true) { // no goto's
    if (level == 0) { return answer; }
    n = stackn[level]; // pop n
    level = level - 1;
    answer = n * answer; } // factorial
  }

2 Comments

This might have been the first time I heard 'more clear' and 'goto' used in the same sentence ;-) maybe it would be cleaner if you used the while loops properly and put the guard where it belongs... while(n != 1) { ... } answer = 1;
@Heuster - Noted. Just wanted you to know that I certainly thought of while(n != 1) { .. but went with the (to me) more like the VBA code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.