1

Can any one help me in finding the method names using regular expressions in javascript files.

5
  • 7
    Can you give an example of what you're trying to do? Commented Oct 4, 2010 at 13:38
  • for example if i have a file called file1.js which contain's like 100 method's .I have developed a tool in which i need to find the this 100 Method's . Commented Oct 4, 2010 at 13:45
  • For Finding this i used the basic info that .I read the complete file using file stream and stored in a string array.After that i splitter the array and searched for the string "function" and using indexing and found the name's . Commented Oct 4, 2010 at 13:47
  • But this is a tedious task.So i want to use the regular expression's Commented Oct 4, 2010 at 13:48
  • 2
    Use a proper tokenizer - regex alone is not enough. Commented Oct 4, 2010 at 13:50

1 Answer 1

2
(?!function\s+)([_$a-zA-Z][_$a-zA-Z0-9]*)(?=\s*\()

There are many issues you can run into when trying to parse JavaScript with regexp. First we have a couple things that under normal circumstances would be ignored by a lexer.

WhiteSpace
LineTerminator
Comment

Now the concept of white space is not as simple as a space character. Here is a full list of characters that must be covered in your regexp.

WhiteSpace:
    '\u0009'
    '\u000c'
    '\u00a0'
    '\u180e'
    '\u2001'
    '\u2003'
    '\u2005'
    '\u2007'
    '\u2009'
    '\u202f'
    '\u3000'
    '\u000b'
    '\u0020'
    '\u1680'
    '\u2000'
    '\u2002'
    '\u2004'
    '\u2006'
    '\u2008'
    '\u200a'
    '\u205f'
    '\ufeff'

Right off the bat our regexp has ballooned in complexity. Now we have the LineTerminator production which once again is not as simple as you would think.

LineTerminator:
    '\u000a'
    '\u000d'
    '\u2028'
    '\u2029'

I won't go into more detail but here are a few examples of perfectly valid function definitions.

function
a() {

}

function /*Why is this comment here!!!*/ a() {

}

So we are left with some good news and some bad news. The good news is that my simple regexp will cover most of the common cases. As long as the file is written in a sane matter it should work just fine. The bad news is if you wanted to cover all corner cases you will be left with a monstrosity of a regexp.

Note

I just wanted to say that the regexp to match a valid function identifier would be particularly horrendous.

Sign up to request clarification or add additional context in comments.

3 Comments

No. JavaScript syntax is far too complex for regex to parse it reliably.
And not only are regular expressions inadequate for the task of parsing Javascript, in the general case locating the "names" of all the functions in a source file is not always even a meaningful concept. Because Javascript functions are values, looking for the names of all the functions in a module is like looking for the names of all the numbers in a module.
@bobince - You would be surprised at how close you can get. We would never come close to 100% though. Depending on the project requirements writing a proper tokenizer may not be worth the effort. From my experience it would take a minimum of 350 lines (F# using parser combinators.) and anywhere between 1000-5000 lines of C# (The parser generators tend to generate huge code files. If you write it manually you can get away with 1000.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.