7

I am trying to create a very simple parser for an if-else type structure that will build and execute a SQL statement.

Rather than testing conditions for executing statements, I would be testing conditions to build a string.

An example of a statment would be:

select column1
from
#if(VariableA = Case1)
table1
#else if(VariableA = Case2)
table2
#else
defaultTable
#end

If VariableA equals Case1 the resulting string should be: select column1 from table1

A more complex example would be with nested if statements:

select column1
from
#if(VariableA = Case1)
#if(VariableB = Case3)
    table3
#else
    table4
#else if(VariableA = Case2)
table2
#else
defaultTable
#end

This is where I am really having trouble, I can't think of a good way to identify each if-else-end group correctly.

Also, I'm not sure what a good way to keep track of if the string in the "else" clause should evaluate to true.

I have been looking around the net at different types of parsing algorithms, all of them seeming very abstract and complex.

Are there any suggestions of a good place to start for this non-computer science major?

6
  • Wait, are you parsing a string or building a string? Commented Feb 14, 2012 at 19:55
  • I am parsing a string AND building a string. The output of the parsing will be a new string that contains the desired pieces of the SQL query. Commented Feb 14, 2012 at 19:55
  • 4
    Is there any reason why you are creating a language of your own and don't use already existing language, like C#? Commented Feb 14, 2012 at 19:56
  • Also, if you're building your own language and a parser for it, you should know at least the basics of the theory about grammars and formal languages. Commented Feb 14, 2012 at 20:02
  • Yes - this is for a reporting application that is designed to be entirely dynamic. I don't want to re-write and re-deploy my application every time I write a new report. Commented Feb 14, 2012 at 20:03

3 Answers 3

11

I wrote a simple parser, which I tested against the example you provided. If you want to know more about parsing I suggest you to read Compiler Construction from Niklaus Wirth.

The first step is always to write down the syntax of your language in an appropriate way. I have chosen EBNF, which is very simple to understand.

| separates alternatives.

[ and ] enclose options.

{ and } denote repetitions (zero, one or more).

( and ) group expressions (not used here).

This description is not complete but the link I provided describes it in more detail.

The EBNF syntax

LineSequence = { TextLine | IfStatement }.
TextLine     = <string>.
IfStatement  = IfLine LineSequence { ElseIfLine LineSequence } [ ElseLine LineSequence ] EndLine.
IfLine       = "#if" "(" Condition ")".
ElseLine     = "#else".
ElseIfLine   = "#else" "if" "(" Condition ")".
EndLine      = "#end".
Condition    = Identifier "=" Identifier.
Identifier   = <letter_or_underline> { <letter_or_underline> | <digit> }.

The parser follows closely the syntax, i.e. a repetition is translated into a loop, an alternative into an if-else statement, and so on. The syntax elements (productions) are realized by methods that can be called recursively. This reflects the recursive nature of the syntax. E.g., an IfStatement can contain yet another IfStatement in one if its branches.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Example.SqlPreprocessor;

class Parser
{
    enum Symbol
    {
        None,
        LPar,
        RPar,
        Equals,
        Text,
        NumberIf,
        If,
        NumberElse,
        NumberEnd,
        Identifier
    }

    List<string> _input; // Raw SQL with preprocessor directives.
    int _currentLineIndex = 0;

    // Simulates variables used in conditions
    readonly Dictionary<string, string> _variableValues = new() {
                { "VariableA", "Case1" },
                { "VariableB", "CaseX" }
            };

    Symbol _sy; // Current symbol.
    string _string; // Identifier or text line;
    readonly Queue<string> _textQueue = new(); // Buffered text parts of a single line.
    int _lineNo; // Current line number for error messages.
    string _line; // Current line for error messages.

    /// <summary>
    /// Get the next line from the input.
    /// </summary>
    /// <returns>Input line or null if no more lines are available.</returns>
    string GetLine()
    {
        if (_currentLineIndex >= _input.Count) {
            return null;
        }
        _line = _input[_currentLineIndex++];
        _lineNo = _currentLineIndex;
        return _line;
    }

    /// <summary>
    /// Get the next symbol from the input stream and stores it in _sy.
    /// </summary>
    void GetSy()
    {
        string s;
        if (_textQueue.Count > 0) { // Buffered text parts available, use one from these.
            s = _textQueue.Dequeue();
            switch (s.ToLower()) {
                case "(":
                    _sy = Symbol.LPar;
                    break;
                case ")":
                    _sy = Symbol.RPar;
                    break;
                case "=":
                    _sy = Symbol.Equals;
                    break;
                case "if":
                    _sy = Symbol.If;
                    break;
                default:
                    _sy = Symbol.Identifier;
                    _string = s;
                    break;
            }
            return;
        }

        // Get next line from input.
        s = GetLine();
        if (s == null) {
            _sy = Symbol.None;
            return;
        }

        s = s.Trim(' ', '\t');
        if (s[0] == '#') { // We have a preprocessor directive.
                           // Split the line in order to be able get its symbols.
            string[] parts = Regex.Split(s, @"\b|[^#_a-zA-Z0-9()=]");
            // parts[0] = #
            // parts[1] = if, else, end
            switch (parts[1].ToLower()) {
                case "if":
                    _sy = Symbol.NumberIf;
                    break;
                case "else":
                    _sy = Symbol.NumberElse;
                    break;
                case "end":
                    _sy = Symbol.NumberEnd;
                    break;
                default:
                    Error("Invalid symbol #{0}", parts[1]);
                    break;
            }

            // Store the remaining parts for later.
            for (int i = 2; i < parts.Length; i++) {
                string part = parts[i].Trim(' ', '\t');
                if (part != "") {
                    _textQueue.Enqueue(part);
                }
            }
        } else { // We have an ordinary SQL text line.
            _sy = Symbol.Text;
            _string = s;
        }
    }

    void Error(string message, params object[] args)
    {
        // Make sure parsing stops here
        _sy = Symbol.None;
        _textQueue.Clear();
        _input.Clear();

        message = String.Format(message, args) + $" in line {_lineNo}\r\n\r\n{_line}";
        Output("------");
        Output(message);
        Console.WriteLine("!!! Error: " + message);
    }

    /// <summary>
    /// Writes the processed line to a (simulated) output stream.
    /// </summary>
    /// <param name="line">Line to be written to output</param>
    static void Output(string line)
    {
        Console.WriteLine(line);
    }

    /// <summary>
    /// Starts the parsing process.
    /// </summary>
    public void Parse()
    {
        // Simulate an input stream.
        _input = [
                    "select column1",
                    "from",
                    "#if(VariableA = Case1)",
                    "    #if(VariableB = Case3)",
                    "        table3",
                    "    #else",
                    "        table4",
                    "    #end",
                    "#else if(VariableA = Case2)",
                    "    table2",
                    "#else",
                    "    defaultTable",
                    "#end"
                ];

        // Clear previous parsing
        _textQueue.Clear();
        _currentLineIndex = 0;

        // Get first symbol and start parsing
        GetSy();
        if (LineSequence(true)) { // Finished parsing successfully.
                                  //TODO: Do something with the generated SQL
        } else { // Error encountered.
            Output("*** ABORTED ***");
        }
    }

    // The following methods parse according the the EBNF syntax.

    bool LineSequence(bool writeOutput)
    {
        // EBNF:  LineSequence = { TextLine | IfStatement }.
        while (_sy is Symbol.Text or Symbol.NumberIf) {
            if (_sy == Symbol.Text) {
                if (!TextLine(writeOutput)) {
                    return false;
                }
            } else { // _sy == Symbol.NumberIf
                if (!IfStatement(writeOutput)) {
                    return false;
                }
            }
        }
        return true;
    }

    bool TextLine(bool writeOutput)
    {
        // EBNF:  TextLine = <string>.
        if (writeOutput) {
            Output(_string);
        }
        GetSy();
        return true;
    }

    bool IfStatement(bool writeOutput)
    {
        // EBNF:  IfStatement = IfLine LineSequence { ElseIfLine LineSequence } [ ElseLine LineSequence ] EndLine.
        if (IfLine(out bool result) && LineSequence(writeOutput && result)) {
            writeOutput &= !result; // Only one section can produce an output.
            while (_sy == Symbol.NumberElse) {
                GetSy();
                if (_sy == Symbol.If) { // We have an #else if
                    if (!ElseIfLine(out result)) {
                        return false;
                    }
                    if (!LineSequence(writeOutput && result)) {
                        return false;
                    }
                    writeOutput &= !result; // Only one section can produce an output.
                } else { // We have a simple #else
                    if (!LineSequence(writeOutput)) {
                        return false;
                    }
                    break; // We can have only one #else statement.
                }
            }
            if (_sy != Symbol.NumberEnd) {
                Error("'#end' expected");
                return false;
            }
            GetSy();
            return true;
        }
        return false;
    }

    bool IfLine(out bool result)
    {
        // EBNF:  IfLine = "#if" "(" Condition ")".
        result = false;
        GetSy();
        if (_sy != Symbol.LPar) {
            Error("'(' expected");
            return false;
        }
        GetSy();
        if (!Condition(out result)) {
            return false;
        }
        if (_sy != Symbol.RPar) {
            Error("')' expected");
            return false;
        }
        GetSy();
        return true;
    }

    private bool Condition(out bool result)
    {
        // EBNF:  Condition = Identifier "=" Identifier.
        string variable;
        string expectedValue;

        result = false;
        // Identifier "=" Identifier
        if (_sy != Symbol.Identifier) {
            Error("Identifier expected");
            return false;
        }
        variable = _string; // The first identifier is a variable.
        GetSy();
        if (_sy != Symbol.Equals) {
            Error("'=' expected");
            return false;
        }
        GetSy();
        if (_sy != Symbol.Identifier) {
            Error("Value expected");
            return false;
        }
        expectedValue = _string;  // The second identifier is a value.

        // Search the variable
        if (_variableValues.TryGetValue(variable, out string variableValue)) {
            result = variableValue == expectedValue; // Perform the comparison.
        } else {
            Error("Variable '{0}' not found", variable);
            return false;
        }

        GetSy();
        return true;
    }

    bool ElseIfLine(out bool result)
    {
        // EBNF:  ElseIfLine = "#else" "if" "(" Condition ")".
        result = false;
        GetSy(); // "#else" already processed here, we are only called if the symbol is "if"
        if (_sy != Symbol.LPar) {
            Error("'(' expected");
            return false;
        }
        GetSy();
        if (!Condition(out result)) {
            return false;
        }
        if (_sy != Symbol.RPar) {
            Error("')' expected");
            return false;
        }
        GetSy();
        return true;
    }
}

Note that the nested if-statements are processed automatically in a quite natural way. First, the grammar is expressed recursively. A LineSequence can contain IfStatments and IfStatments contain LineSequences. Second, this results in syntax processing methods that call each other in a recursive manner. The nesting of syntax elements is therefore translated into recursive method calls.

Sign up to request clarification or add additional context in comments.

2 Comments

<3 Olivier, thank you very much for such a complete and informative solution!
This is an old answer and the C# language has evolved since. I just applied some code fixes suggested by C#12 code analyzers.
3

Take a look at Irony:

Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. Irony's scanner and parser modules use the grammar encoded as c# class to control the parsing process. See the expression grammar sample for an example of grammar definition in c# class, and using it in a working parser.

Comments

1

I recommend you use an existing code generator like ... C# or T4 templates or ASP.NET MVC partial views.

But if you want to do this yourself you need some kind of recursion (or a stack which is equivalent). It could work like this:

string BuildCode(string str)
{
 foreach(Match ifMatch in Regex.Matches("#if(?<condition>[^\n\r]*)[\r\n]*(?<body>.*?)#endif)
 {
  var condition = ifMatch.Groups["condition"].Value;
  return EvaluateCondition(condition) ? BuildCode(ifMatch.Value) : null;
 }
}

This is pseudo-code. You need to think through this yourself. This also does not support an else branch but you can add that easily.

Here is a new answer: Use the CodeDom to compile a C# function. You can use the ful power of C# but have the C# code stored in a database or so. That way you don't have to redeploy.

1 Comment

The that regex didn't take into account the else statement. If all your conditions have the form x = y, in C# .NET, easier to first use: Regex ifElse = new Regex(@"#if\s*[(]\s*(?<var0Name>[^=) ]*)\s*=\s*(?<var0Value>[^)]*)\s*[)](?<onIfStatement>[^#]*)(?<elseIfs>(?:#else if[^#]*)*)(?:#else(?<onElseStatement>[^#]*))?(?:#end)"). Then, process the inside "else ifs" with a similar regex for #else if, and issue one simple .Matches statement - it will return the array of results, where each entry is an array containing the "whole match" and the match groups (i.e. variable, value, etc).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.