0

I am writing a custom parser using regular expressions, but I can't work out how to match functions.

An example of a function in my custom language is:

function int add(int num1, int num2){
  return num1 + num2;
}

My tokenizer uses RegEx to get the next token and remove it from the source code string supplied earlier. This means that when it comes to parsing a function, I can be sure that the code will start with the function statement. I currently have the following expression:

^([\s]*function[\s]+[a-zA-Z][a-zA-Z0-9]*[\s]+[a-zA-Z][a-zA-Z0-9]*[\s]*\(([\s]*[a-zA-Z][a-zA-Z0-9]*[\s]+[a-zA-Z][a-zA-Z0-9]*[\s]*)*\)[\s]*\{.*\}.*)$

It is very long, but it successfully matches these two functions:

function void log(string msg){
  Console.log(msg);
}

and

function int add(int num1 int num2){
  return num1 + num2;
}

I want to be able to split the arguments by a comma.

  • I could make the comma required after a parameter, but then the last parameter would end with a comma.

  • I could make the comma optional after a parameter, but then the user would be able to not put a comma in.

I need to be able to only require the comma between parameters, otherwise it will mess up my parser later. How can I edit my expression to look for a comma between arguments?

Thank you very much for your time.

3
  • Can you tell me exactly In which function do you want to add comma. Commented Aug 15, 2016 at 0:51
  • Forget it. You can use a regex to pull individual tokens out of the string, but I wouldn't even try to parse entire constructs using a regex. For one thing, there's no way to use a regex to parse a variable number of parameters and extract all the information. Commented Aug 15, 2016 at 0:53
  • I just want to match (int num1, int num2). This is just \(([\s]*[a-zA-Z][a-zA-Z0-9]*[\s]+[a-zA-Z][a-zA-Z0-9]*[\s]*)*\). I need to check that the parameters are split by a comma. Commented Aug 15, 2016 at 1:49

2 Answers 2

1

This regex should work for the (int num1, int num2) part of the string:

(\((?:\s*[^\s,]+\s+[^\s,]+\s*,)*\s*[^\s,]+\s+[^\s,]+\s*\))

It's easier to read when you space it out:

(\(
 (?:      \s*
   [^\s,]+\s+
   [^\s,]+\s*,
 )*       \s*
   [^\s,]+\s+
   [^\s,]+\s*
\))
Sign up to request clarification or add additional context in comments.

Comments

0

You could think of it as having 3 different possibilities: zero parameters, one parameter, and more than one parameter. Then just check using the or operator for each different possibility.

One Parameter:

(?:\\w+\\s+\\w+)

More than one parameter:

(?:\\w+\\s+\\w+)(?:\\,\\s+(?:\\w+\\s+\\w+))+

Zero Parameters:

\\s*

When using the or statement (all of the above):

((?:\\w+\\s+\\w+)|(?:\\w+\\s+\\w+)(?:\\,\\s+(?:\\w+\\s+\\w+))+|\\s*)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.