2

hello i am writing a java program to remove all comment from a string that conatins php source code can any one give me regular expression for php comment ?? please

3 Answers 3

3

Have a look at this link: http://ostermiller.org/findcomment.html

He arrives at this solution (for /* ... */ comments):

sourcecode.replaceAll("/\\*(?:.|[\\n\\r])*?\\*/","");

For // ... and # ... comments you should be able to do something like

sourcecode.replaceAll("(//|#).*?[\\n\\r]");

Beware of the following type of situations though:

  • someString = "An example comment: /* example */";

  • someString = "An example comment: // example";

  • someString = "An example comment: # example";

Sign up to request clarification or add additional context in comments.

5 Comments

This will trim neither pound nor double slash comments
Right. It is for /* ... */ comments. Updated answer.
PHP allows you to start a comment with the # sign.
You mention comments inside strings. Those might be legitimately, and significantly, used to compose javascript containing conditional compilation directives inside comments: $s = "<script>/* @cc_on */...</script>";
Also watch out for <?php # echo 'simple';?>, as in the example listed at php.net/manual/en/language.basic-syntax.comments.php
1

Like Spudley said, you cannot simply write a regex to do this. There are too many exceptional cases, like comment-like strings inside strings, and line comments terminated early by closing PHP tags. In order to guarantee correctness, you could have to write an entire language parser.

However, if you're willing to use PHP itself to do the filtering for you, this question has all the answers, and it should be significantly easier and more robust. If you have PHP installed on the same machine as the Java application, you can run PHP using Runtime.exec() and getting the console output, or have PHP export to a file and import it later into your program.

Comments

1

This will be extremely tricky!

For a start, you have three types of comment in PHP: /* ... */, and also // and #.

But you need to exclude those which are part of a string, especially as // can appear quite often in strings, as an escaped slash character, and a # character inside a string could be perfectly legitimate part of the text.

To compound this problem, strings can be multi-line, and in addition to single and double-quotes, they can also be written using Heredoc and Nowdoc syntax (see http://php.net/manual/en/language.types.string.php), which may be particularly tricky to pick out accurately with regex. Plus of course, you need to be sure you're within the <?php ... ?> tags.

It can probably be done, but to be honest I'd say that with all of that to deal with, you'd be far better off using a language parser than regex to try to do this.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.