2

I have a regex for finding all function definitions. What I want to do now is to get also the contents in the functions e.g. as third field in $matches is that possible using regex or do I need some push-pop machine because of the nesting of {} brackets? What I want to do is a script which analyzes php code and figures out which functions have dependencies. If there is already a script let me know it!

$content = file_get_contents($fileName);
preg_match_all("/(function )(\w+\(.*?\))/", $content, $matches);

I don't want to use php-tokenizer because it figures out also some "hidden-functions" like predefined functions and that stuff, but I want just the functions written in code.

6
  • 3
    A regex won't get you very far at all, you need a proper parser. Like NikiC's PHP Parser. Commented May 28, 2013 at 6:37
  • I agree with @deceze it'd be better to use parser. With Regex it's hard. Commented May 28, 2013 at 6:43
  • isn't it a bit too much? I just want to analyze simple functions and create a graph of there's dependencies using graphviz Commented May 28, 2013 at 6:46
  • 1
    Nope, it's not overkill, it's the right tool for the job. PHP is not a regular language, therefore regular expressions are the wrong tool for the job. Come on, you don't even know where to start using regexen, right? :-3 Commented May 28, 2013 at 6:59
  • 1
    Maybe this would be helpful?? forums.phpfreaks.com/topic/… Commented May 28, 2013 at 7:09

1 Answer 1

2

Even if for better or worse you're not Noam Chomsky, you should understand this:

PHP is not a regular language, so cannot be expressed or parsed by regular expressions.

To be a regular language, a language needs to be, among other things, context free.

language hierarchies

"Context free" means that a "word" in the language means the same thing regardless of where it occurs. This is not the case for PHP. In fact, even your simple snippet to find function signatures already crashes and burns here:

// function foo()

The context of a comment voids this function keyword of its usual meaning. Not to mention:

'function foo()';
<<<HERE
    function foo()
HERE;

and a host of similar examples. The function keyword (and everything else too) is dependent on context, making PHP a context-sensitive language, thereby not regular, thereby not feasibly parseable by regular expressions.

Use a parser.

Sign up to request clarification or add additional context in comments.

2 Comments

true thoughts... sometimes I should use things learned in theoretical informatics ;)
While it is true that regex is not the right tool here, it's worth noting that PCRE regex is able to match words in specific context (via lookarounds and other zero-width assertions, and also capturing alternation groups), and also supports recursion and subroutine calls. PHP regular expressions are not "regular" in Chomsky's terms.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.