0

Note, the goal here is not lexical analysis so please do not suggest lexing or parsing code. And, my apology for adding to the mess of "regex comments" questions but the best (most voted) bad answer (given the context of how the result would be used based on the question) is inadequate, (though I was able to start from there) and many of the other answers I've reviewed are simply irrelevant to what I'm trying to do.

I've built a regex which works in principle as expected here.


/(?:\n|^)(?:[^'"])*?(?:'(?:[^\\\r\n]|[\\]{2}|\\')*'|"(?:[^\\\r\n]|[\\]{2}|\\")*")*?(?:[^'"])*?(\/\*(?:[\s\S]*?)\*\/)/g

The final group matches block comments well, as reference in the above SO:

(\/\*(?:[\s\S]*?)\*\/)

Everything preceding the actual match is discarded, but used for the purpose of matching a valid block comment - i.e. not something found in a string literal.

Ignore the case where a regex can look like a block comment.

Assume that the input string is linted, not free-form javascript.


But in practice, I'm getting a duplicate on the first match and no other matches.

Why? And how might it be corrected to work in practice?

Thanks in advance for your help and any trouble the question may put you through. :)

Also (in the comments section) any potential pit falls are welcome, given the information below.

Extra information irrelevant to the direct question: The ultimate goal, as hinted in the example code, is to replace/collapse any nested or otherwise code structures in such a way so as to focus on the variable declarations at the top of the lexical scope for a given patch of code - for the purpose of hoisting variable declarations, to generate a template for a specific use case. I know that sounds like a load, but I believe it is possible and relatively straight forward - NOT ENTIRELY WITH SIMPLE REPLACEMENT - but none the less. For reference to what I mean by "possible", I would prefer to only collapse regexs, block comments and inline comments EDIT: and string literals /EDIT, then recursively collapse only variable scopes (or plain objects) in {blocks} (all of them which do not contain any nested blocks) until they are gone, then see what's left. If it seems like this won't work for any reason, please respond only in comments. Thank you!

10
  • You would have to look at the top level parser .. code. If it does C/C++ comments style first, does it exclude quotes or not. Is it possible html can get in the way? Commented Jul 7, 2015 at 1:16
  • @sln, String literals yes good point, I'll edit that in. And html, there will not be any. Commented Jul 7, 2015 at 1:18
  • I can give you a bullet-proof regex that does all C/C++ comment processing. Is that what you need? Commented Jul 7, 2015 at 1:18
  • As long as this is JavaScript only, i.e. no html, it will work. Is that the case? Commented Jul 7, 2015 at 1:24
  • @sln Yes, given that regex literals should be out of the way first, but that's another issue - basically the part of this regex that ins't quite working would also serve that purpose. But in any case, not html present in the code - also given that string literals will be gone. :) Commented Jul 7, 2015 at 1:28

1 Answer 1

1

This is one of those "ugh, yeah, of course!" moments.

The exec() function will generate an array with 1 element, being the matched element. Except it doesn't, the first element is the full match, which is great unless there are capture groups. If there are, then in additional to result[0] being the full pattern match, result[1] will be the first capture group, result[2] the second, and so on.

For example:

  1. (/l/g).exec("l") gives us ["l"]
  2. (/(l)/g).exec("l") gives us ["l", "l"]

You RE isn't so much the problem (although running the string through a stream filter that takes out block comments is probably easier to work with) as it's more a case of the assumption that you can just use .join() on the exec results that's been causing you problems. If you have capture groups, and you have a result, join results.slice(1), or call results.splice(1,0) before joining to get rid of the leading element, so you don't accidentally include the full match.

Sign up to request clarification or add additional context in comments.

3 Comments

Hmm, ok, I tried string match, and that works better, but still not discarding (?:), why would that be? jsfiddle.net/375t3cLL/4
Try jsbin.com/xelamotabo/edit?html,js,output, although this suggests your RE is not doing the continued match quite right (it's getting the var b = function(){ part, for instance, and the "invalid" part on the last section)
Yeah, I'm onto it. Thanks Mike, you get the gold star for today. :) I'll remember .exec() in the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.