2

In our product we are trying to parse the following different formats from a given piece of text -

  1. ${{node::123456}}
  2. ${{node:123456}}
  3. $fn{{#functionName('abcd',',',' somethingWithASpace')}}
  4. $fn{{#functionName('abcd','#','${{node::123456}}')}}
  5. ${{rmtrqst:someText[]->abcd}}

Sample of the text is like -

  1. Hi, how are you ${{node::123456}}? Your order id is ${{node::636636}}.

or

  1. Your order was placed on $fn{{#dateConverterFunction('abcd','#','${{node::123456}}')}}

I tried with Regex /\$((fn)\{{2}(\#|)(\w*)((\(.*\))|([^\$]*))\}{2})/gi - but this is not helping much. Can anyone suggest me how to write a parser for this?

A grammar could be like this -

  1. Every expression starts with $ followed by either fn{{ or {{
  2. After that there will be a string like node or #functionName or something else
  3. that might be followed by a parenthesis enclosed string (this may contain the whole expression like ${{node::1234}} inside it - we should ignore whatever inside parenthesis
  4. Finally it will be closed by }}
5
  • 3
    The first step would be to define a grammar. This is heavily underspecified. Commented Jul 10, 2020 at 4:47
  • 1
    The grammar could be like this - 1. Every expression starts with $ followed by either fn{{ or {{ 2. After that there will be a string like node or #functionName or something else 3. that might be followed by a parenthesis enclosed string (this may contain the whole expression like ${{node::1234}} inside it - we should ignore whatever inside parenthesis 4. Finally it will be closed by }} Commented Jul 10, 2020 at 4:54
  • I doubt that a single regex is going to work in all possible cases here. You might just start by a simple regex to isolate ${{}} and then parse whatever is inside it... Commented Jul 10, 2020 at 5:38
  • "A string like node" is again by far not specific enough. What characters? Can function calls be nested, ... Commented Jul 10, 2020 at 9:18
  • @IngoBürk - no function calls are not nested - 'a string like node' means ${{node Commented Jul 10, 2020 at 16:38

1 Answer 1

3

Use a tokenizer and let it break the strings down to a meaningful structure.

The nearly.js library is a popular choice for parsing non-linear structures like yours. You can choose to keep your expressions simple - or, if choose otherwise, the library can create an abstract syntax tree for complicated grimmer.

To write a parser using the library, define your vocabulary in a seperate file and use it for parsing.

Or you can directly using the tokanizer to get your string tokanized.

@{%
const moo = require("moo");

const lexer = moo.compile({
  ws:     /[ \t]+/,
  number: /[0-9]+/,
  word: /[a-z]+/,
  times:  /\*|x/
});
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

# Use %token to match any token of that type instead of "token":
multiplication -> %number %ws %times %ws %number {% ([first, , , , second]) => first * second %}

# Literal strings now match tokens with that text:
trig -> "sin" %number
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.