I want to extract javasscript code and find out if there are any dynamic tag creations like document.createElement('script'); I have tried to do this with Regular expressions but using regular expressions restricts me to get only some formats so i thought of writing a javascript parser which extracts all the keywords, strings and functions from the javascript code.
-
so what exactly is your problem with writing it?Rene Pot– Rene Pot2012-03-29 11:56:44 +00:00Commented Mar 29, 2012 at 11:56
-
How do you know it won't call functions that create elements? For example, jQuery can also add new elements to the DOM and your approach right now won't detect that.Simeon Visser– Simeon Visser2012-03-29 11:57:54 +00:00Commented Mar 29, 2012 at 11:57
-
For now i am just concerned with normal javascript please suggest some method to do ituser1275375– user12753752012-03-29 11:59:25 +00:00Commented Mar 29, 2012 at 11:59
3 Answers
In general there is no way to know if a given line of code will ever run, you would need to solve the halting problem.
If you restrict your analysis to just finding occurances of a function call you don't make much progress. Naive methods will still be easy to trick, if you just regex match for document.createElement, you would not be able to match something as simple as document["create" + "Element"]. In general you would need to not only parse the code but evaluate it as well to get around this. And to be sure that you can evaluate the code you would again need to solve the halting problem.
Comments
Well the first rule is never use regex for big things like this, or DOM, or ... . You have to parse it by tokens. The good news is that you don't have to write your own. There are a few JS to JS parsers.
They may be a bit hard to work with it. But well better to work with them. There are other projects that are uses these such as burrito or code surgeon. So you can have a look at the source code and see how they uses them.
But there is bad news too, which people can still outsmart other people, let alone the parsers and the code they write. At least you need to evaluate the code with some execution time variables and see if it tries to access the DOM or not.