1

I'm working on a regex but I'm not able to fix it.

I'm scanning documents (.php) with PHP and I'm looking for: $__this('[TEXT]') or $__this("[TEXT]")

So my question is: can somebody help me with a regex that searches in a string for: $__this('[TEXT]') or $__this("[TEXT]") and gives me [TEXT]

UPDATE (with answer, thanks to @Explosion Pills):

$string = '$__this("Foo Bar<br>HelloHello")';
preg_match('/\$__this\(([\'"])(.*?)\1\)/xi', $string, $matches);
print_r($matches);
4
  • Even if the text is given back, you still need to do extra processing to get back the actual string, though. Commented Feb 1, 2013 at 14:09
  • 1
    You can use this: php.net/manual/en/function.token-get-all.php to tokenize the PHP file into PHP tokens. The rest should not be that hard. Commented Feb 1, 2013 at 14:11
  • You would probably need something like this: \$__this(('|")[(\w+)]('|")) Commented Feb 1, 2013 at 14:11
  • why would you want to scan PHP code like that? Please tell me you're not planning on writing a PHP program that modifies other PHP programs? That's a scary thought. Commented Feb 1, 2013 at 14:14

3 Answers 3

2
preg_match('/
    \$__this # just $__this.  $ is meta character and must be escaped
    \(       # open paren also must be escaped
    ([\'"])  # open quote (capture for later use).  \' is needed in string
    (\[      # start capture.  open bracket must also be escaped
    .*?      # Ungreedily capture whatever is between the quotes
    \])      # close the open bracket and end capture
    \1       # close the quote (captured earlier)
    \)       # close the parentheses
/xi'         # ignore whitespace in pattern, allow comments, case insensitive
, $document, $matches);

The captured text will be in $matches[2]. This assumes one possible capture per line. If you need more, use preg_match_all.

Sign up to request clarification or add additional context in comments.

7 Comments

This work if assuming all the text has format [something here], but I think [] is just some sort of marking saying that that part can be anything. The question is quite unclear on this point.
@nhahtdh if the [] are not actually a part of the string, then you can simply remove them from my pattern. I thought they were specifically there, though. Does that make my answer unworthy of an upvote?
If the text contain '", and you also remove [], then your answer worth a down vote.
Nhahtdh is right. This works: $string = '$__this("[TEXT]")'; But I want also to get (for example): $string = '$__this("Foo Bar")'; PS: I will update this question when I've got it :-)
@nhahtdh I don't understand what you mean. Are you telling me to change something? I will remove the \[ and \]. What do you mean if it contain '"?
|
0

how about:

preg_match('/\$__this(?:(\'|")\((.+?)\)\1)/', $string);

explanation:

(?-imsx:\$__this(?:(\'|")\((.+?)\)\1))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \$                       '$'
----------------------------------------------------------------------
  __this                   '__this'
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      \'                       '''
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      "                        '"'
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    \(                       '('
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      .+?                      any character except \n (1 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    \)                       ')'
----------------------------------------------------------------------
    \1                       what was matched by capture \1
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

1 Comment

You're missing the parentheses and brackets
0

Here's a solution that will catch strings with quotes and apostrophes in them as well.

$txt = "
blah blah blah
blah \$_this('abc') blah
blah \$_this('a\"b\"c') blah balah \$_this('a\"b\"c\'')
\$_this(\"123\");\$_this(\"1'23\") \$_this(\"1'23\\\"\")
";

  $matches = array();
  preg_match_all('/(?:\$_this\()(?:[\'"])(.*?[^\\\])(?:[\'"])(?:\))/im', $txt, $matches);
  print_r($matches[1]);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.