0

I am trying to build a regular expression which matches different types of echo statements.... the word echo has already been match..

Example patterns to be matched

"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;

pattern for var

/^[a-zA-Z_][a-zA-Z0-9_]*/

I already have a pattern to match first 2 patterns...

/((^"[^"]*"\.{0,1})*;)/
4
  • Why do you think a recursive approach is better? Why do you need to do this? Maybe there's a better approach. Commented Apr 27, 2014 at 8:57
  • @AmalMurali bcause the expn neede to repeat only on encountering a .(dot) Commented Apr 27, 2014 at 9:17
  • After reading your updated question, I can tell regex is not the best way to accomplish this task. You're better of with an actual parser. Take a look at NikiC's PHP parser. Commented Apr 27, 2014 at 9:20
  • i completely agree with u, i realized this after starting it in PHP and tried to look at some of the parsers available but cudnt figure how to make it work.. About my... After entering in Textbox, when a user clicks submit button all of this needs to done in the background automatically.. Commented Apr 27, 2014 at 9:25

3 Answers 3

1

Next to the two given suggestions, if you're looking for PHP PCRE based regexes to validate a subset of PHP, this can be done more structured by specifying named subpatterns for the tokens you're looking for. Here is an exemplary regular expression pattern that's looking for these patterns even allowing whitespace around (as PHP would do) for any us-ascii based extended single-byte charsets (I think this is how PHP actually treats it even if it's UTF-8 in your files):

~
(?(DEFINE)
    (?<stringDoubleQuote> "(?:\\"|[^"])+")
    (?<stringSingleQuote> '(?:\\'|[^'])+')
    (?<string> (?:(?&stringDoubleQuote)|(?&stringSingleQuote)))
    (?<variable> \\\$([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*))
    (?<varorstring> (?:(?&variable)|(?&string)))
)
^ \s* (?&varorstring) (?: \s* \. \s* (?&varorstring) )* \s* ; $
~x

Thanks to the named subpatterns it's easy to use a token for any string or variable and add the whitespace handling and string concatenating operator. Such assigned to $pattern, an example of use is:

$lines = <<<'LINES'
"hiii";
"how"."are"."you";
$var."abc";
"abc".$var;
'how'."how".$var;
LINES;    

foreach (explode("\n", $lines) as $subject) {
    $result = preg_match($pattern, $subject);
    if (FALSE === $result) {
        throw new LogicException('PCRE pattern did not compile.');
    }
    printf("%s %s match.\n", var_export($subject, true), $result ? 'did' : 'did not');
}

Output:

'"hiii";' did match.
'"how"."are"."you";' did match.
'$var."abc";' did match.
'"abc".$var;' did match.
'\'how\'."how".$var;' did match.

Demo: https://eval.in/142721

Related

Sign up to request clarification or add additional context in comments.

8 Comments

Genius! im not so good in php so not able to understand few things [but understood ur logic] can u tell more about the pattern or point some tutorial? i tried to google but didnt find any proper answers...like for instance why have u used <<< , ~ , DEFINE in pattern? <string> means ur giving name to the sub-pattern ?? also how to alter it to match only $var; or ""; [if possible]... Thanks a lot
The (?(DEFINE) syntax is a little known feature, interesting and detailed answer.
@Aamir: Everthing on how to write strings in PHP is outlined in the PHP manual: php.net/string - And everything about how to write a PCRE regular expression is outlined in the Perl documentation (PCRE aims to be compatible): perldoc.perl.org/perlre.html#Extended-Patterns (sorry much to read and regexes are sometimes hard to wrap the head around, this at least are both the references so you can rely to these safely)
thanks a lot for the refrences bro.. now im understanding a lot better, but why is only $var; not matching ?? as far as im understanding the same pattern should be able to match but it is not matching ...
@Aamir: For me, it matches: eval.in/142892 - All I did was adding it (and making the pattern more readable but that should not have changed it's behavior. Try also by just editing the original demo.
|
1

Regular expressions aren't a solution for everything. For example, in this case it's easily noticeable you want to parse PHP code. Just like you shouldn't parse HTML with regex, you shouldn't parse PHP with regex.

Instead, use PHP's tokenizer, which can be used to parse PHP expressions.

1 Comment

i saw tokenizer, i guess this is going to be useful for me in other parts of my project.. but i didnt und how it can b used w.r.t this ques...
0

You can do that with the following regex without needing to use recursion:

^"[^"]+"(\."[^"]+")*;$

Demo: http://regex101.com/r/oW5zH4

2 Comments

Thanks that works great... can we include other patterns in the same reg ?? [see modified question]
but it wont match ""; thats the reason i included * in the begining, sorry for not giving enough examples... basically i wont match almost all types of echo provided by php [as far as possible]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.