Can any one help me in finding the method names using regular expressions in javascript files.
-
7Can you give an example of what you're trying to do?Daniel Vandersluis– Daniel Vandersluis2010-10-04 13:38:24 +00:00Commented Oct 4, 2010 at 13:38
-
for example if i have a file called file1.js which contain's like 100 method's .I have developed a tool in which i need to find the this 100 Method's .Hemant Kumar– Hemant Kumar2010-10-04 13:45:27 +00:00Commented Oct 4, 2010 at 13:45
-
For Finding this i used the basic info that .I read the complete file using file stream and stored in a string array.After that i splitter the array and searched for the string "function" and using indexing and found the name's .Hemant Kumar– Hemant Kumar2010-10-04 13:47:30 +00:00Commented Oct 4, 2010 at 13:47
-
But this is a tedious task.So i want to use the regular expression'sHemant Kumar– Hemant Kumar2010-10-04 13:48:14 +00:00Commented Oct 4, 2010 at 13:48
-
2Use a proper tokenizer - regex alone is not enough.Amarghosh– Amarghosh2010-10-04 13:50:41 +00:00Commented Oct 4, 2010 at 13:50
1 Answer
(?!function\s+)([_$a-zA-Z][_$a-zA-Z0-9]*)(?=\s*\()
There are many issues you can run into when trying to parse JavaScript with regexp. First we have a couple things that under normal circumstances would be ignored by a lexer.
WhiteSpace LineTerminator Comment
Now the concept of white space is not as simple as a space character. Here is a full list of characters that must be covered in your regexp.
WhiteSpace:
'\u0009'
'\u000c'
'\u00a0'
'\u180e'
'\u2001'
'\u2003'
'\u2005'
'\u2007'
'\u2009'
'\u202f'
'\u3000'
'\u000b'
'\u0020'
'\u1680'
'\u2000'
'\u2002'
'\u2004'
'\u2006'
'\u2008'
'\u200a'
'\u205f'
'\ufeff'
Right off the bat our regexp has ballooned in complexity. Now we have the LineTerminator production which once again is not as simple as you would think.
LineTerminator:
'\u000a'
'\u000d'
'\u2028'
'\u2029'
I won't go into more detail but here are a few examples of perfectly valid function definitions.
function
a() {
}
function /*Why is this comment here!!!*/ a() {
}
So we are left with some good news and some bad news. The good news is that my simple regexp will cover most of the common cases. As long as the file is written in a sane matter it should work just fine. The bad news is if you wanted to cover all corner cases you will be left with a monstrosity of a regexp.
Note
I just wanted to say that the regexp to match a valid function identifier would be particularly horrendous.