How to write a parser using javascript?

Question

In our product we are trying to parse the following different formats from a given piece of text -

${{node::123456}}
${{node:123456}}
$fn{{#functionName('abcd',',',' somethingWithASpace')}}
$fn{{#functionName('abcd','#','${{node::123456}}')}}
${{rmtrqst:someText[]->abcd}}

Sample of the text is like -

Hi, how are you ${{node::123456}}? Your order id is ${{node::636636}}.

or

Your order was placed on $fn{{#dateConverterFunction('abcd','#','${{node::123456}}')}}

I tried with Regex /\$((fn)\{{2}(\#|)(\w*)(($.*$)|([^\$]*))\}{2})/gi - but this is not helping much. Can anyone suggest me how to write a parser for this?

A grammar could be like this -

Every expression starts with $ followed by either fn{{ or {{
After that there will be a string like node or #functionName or something else
that might be followed by a parenthesis enclosed string (this may contain the whole expression like ${{node::1234}} inside it - we should ignore whatever inside parenthesis
Finally it will be closed by }}

The first step would be to define a grammar. This is heavily underspecified. — Ingo Bürk
– Ingo Bürk, Commented Jul 10, 2020 at 4:47
The grammar could be like this - 1. Every expression starts with $ followed by either fn{{ or {{ 2. After that there will be a string like node or #functionName or something else 3. that might be followed by a parenthesis enclosed string (this may contain the whole expression like ${{node::1234}} inside it - we should ignore whatever inside parenthesis 4. Finally it will be closed by }} — Saikat Bhattacharya
– Saikat Bhattacharya, Commented Jul 10, 2020 at 4:54
I doubt that a single regex is going to work in all possible cases here. You might just start by a simple regex to isolate ${{}} and then parse whatever is inside it... — joshstrike
– joshstrike, Commented Jul 10, 2020 at 5:38
"A string like node" is again by far not specific enough. What characters? Can function calls be nested, ... — Ingo Bürk
– Ingo Bürk, Commented Jul 10, 2020 at 9:18
@IngoBürk - no function calls are not nested - 'a string like node' means ${{node — Saikat Bhattacharya
– Saikat Bhattacharya, Commented Jul 10, 2020 at 16:38

Charlie · Accepted Answer · 2020-07-10 05:33:47Z

Use a tokenizer and let it break the strings down to a meaningful structure.

The nearly.js library is a popular choice for parsing non-linear structures like yours. You can choose to keep your expressions simple - or, if choose otherwise, the library can create an abstract syntax tree for complicated grimmer.

To write a parser using the library, define your vocabulary in a seperate file and use it for parsing.

Or you can directly using the tokanizer to get your string tokanized.

@{%
const moo = require("moo");

const lexer = moo.compile({
  ws:     /[ \t]+/,
  number: /[0-9]+/,
  word: /[a-z]+/,
  times:  /\*|x/
});
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

# Use %token to match any token of that type instead of "token":
multiplication -> %number %ws %times %ws %number {% ([first, , , , second]) => first * second %}

# Literal strings now match tokens with that text:
trig -> "sin" %number

Collectives™ on Stack Overflow

How to write a parser using javascript?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related