0

I have a data interpretation algorithm & actual data. Using this algorithm, I have to interpret the actual data and display it as a report.

For this, Firstly I need to create a form which will accept some variable values from user. The variables are defined in pseudocode as below. (one example given)

AGEYEARS {
Description: Age in Years
Type: Range;
MinVal: 0;
MaxVal: 124;
Default: 0;
ErrorAction: ERT1:=04 GRT4:=960Z; 
}

I have several variables like this in my Variables.txt file. I don't wish to use StreamReader, read it line by ine & interpret the variables.

Instead, I am looking for some logic, which can read XXXX { } as one object and Type:Range as Attribute:Value. This way, I can skip one step of reading the file and converting it to a understandable code.

Like this, I also have other files which has conditions to check. For ex, IF SEX = '9' THEN SEX:=U ENDIF

Is there any way to interpret them easily and faster? Can someone help me with it?

I am using C# as my programming language.

1
  • so you need to build a 4GL to load existing data definitions to make that workflow available in a .net app? Commented Jul 10, 2014 at 22:20

2 Answers 2

1

So you need a parser for a DSL.

I can advise you ANTLR, which will let you build a grammar easily.

Here's a totally untested simple grammar for it:

grammar ConfigFile;

file: object+;
object: ID '{' property+ '}';
property: ID ':' value ';';
value: (ID|CHAR)+;

ID: [a-zA-Z][a-zA-Z0-9_]*;
WS: [ \t\r\n]+ -> channel(HIDDEN);
CHAR: .;

Alternate solution: You also could use regex:

(?<id>\w+)\s*\{\s*(?:(?<prop>\w+)\s*:\s*(?<value>.+?)\s*;\s*)*\}

Then extract the captured information. For each match, you'll have a group id with the name of the object. The groups prop and value will have multiple captures, each pair defining a property.

In C#:

var text = @"
AGEYEARS {
    Description: Age in Years;
    Type: Range;
    MinVal: 0;
    MaxVal: 124;
    Default: 0;
    ErrorAction: ERT1:=04 GRT4:=960Z; 
}

OTHER {
    Foo: Bar;
    Bar: Baz;
}";


var re = new Regex(@"(?<id>\w+)\s*\{\s*(?:(?<prop>\w+)\s*:\s*(?<value>.+?)\s*;\s*)*\}");

foreach (Match match in re.Matches(text))
{
    Console.WriteLine("Object {0}:", match.Groups["id"].Value);

    var properties = match.Groups["prop"].Captures.Cast<Capture>();
    var values = match.Groups["value"].Captures.Cast<Capture>();

    foreach (var property in properties.Zip(values, (prop, value) => new {name = prop.Value, value = value.Value}))
    {
        Console.WriteLine("    {0} = {1}", property.name, property.value);
    }

    Console.WriteLine();
}

This solution is not as "pretty" as the parser one, but works without any external lib.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi,ANTLR looks close to what I was searching for. I don't wish to use Regex as it would become complex later. I am only searching for a clean solution. Basically, I am new to Domain Specific Language translation. So I need some more help in this are. I purchased ANTLR4 reference guide just sometime ago & started reading it. I installed ANTLR4 via Nuget in Visual Studio, created a .g4 file with your code & compiled it too. Now I want to give my Variables grammar file as input and get the C# code for it. Can you pls help me on this? Is there a place where I can see step-by-step guide?
If you have the reference guide then you already have the best source of info. Oh, and there is an ANTLR plugin for visual studio which eases the compilation process.
1

I advice you against using regular expressions. Maybe it will work at start, but if your task will become a bit more complex it might be the case regex won't solve your problem, because it technically cannot do this.

The better choice (for the price of adding library) is using some parser. For C# there might not be as many as for other languages, but there are enough -- just take your pick :-). You have Irony, Coco/R, GOLD, ANTLR, LLLPG, Sprache, or my NLT.

If you sense that you will have mathematical precedence issues (i.e. you will have to work with evaluating of expressions like "5+5*2" which should give 15, not 20) than compare top-down parsers -- ANLTR is one of them -- syntax first against bottom-up parsers -- NLT for example. Usually in the first ones you have to write rules in quirky order (you have to embed the rules) while in the latter ones you have just to set the order of them (stating * goes before +). In other words, rules are separated from precedence.

1 Comment

As of ANTLR v4, you no longer have to use quirky syntax to encode the precedence rules. You just write alternatives in a rule in precedence order.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.