0

PHP Regex

Source:

http://example.com/wp-content/uploads/2017/01/image.jpg
https://example.com/wp-content/uploads/2017/01/image2.jpg
http://example.com/wp-content/plugins/example-plugin/images/image.jpg

Objective

I want to match all strings that:

  • Contains HTTP, but not HTTPS
  • Contains wp-content/uploads/

.. and I do not want to capture the wp-content/uploads/ part so that's a non-capturing group from what I can figure.

I have tried doing a positive-lookahead but I can't seem to get it right. This is what I've come up with so far but I dont' know where to put the HTTP part. The regex tester at regex101 just doesnt match.

(?=(?:(wp-content\/uploads)+))

Update:

To clarify, I need simple regex, no PHP code. In other words, PCRE which PHP uses.

5
  • 1
    if (parse_url($url, PHP_URL_SCHEME) == "https") { return false; } not suitable for you? Commented Jan 10, 2017 at 14:55
  • Are your urls already isolated (each one is in a separated string) or are all of them inside a larger string? Commented Jan 10, 2017 at 14:56
  • These strings are present in post_content and other database columns on a WordPress installation. My objective is to replace http with https through search-and-replace tool: interconnectit.com/products/… Commented Jan 10, 2017 at 15:00
  • More about the flags than the RegExp, e.g. ~^http://[^\v]+/wp-content/uploads[^\v]+$~mg it needs to be multiline (m) and global (g): regex101.com/r/hId37t/1 Commented Jan 10, 2017 at 16:45
  • You may want to go with http://[^\s'"]+?wp-content/uploads/[^\s"']+ Commented Jan 10, 2017 at 20:46

3 Answers 3

3

something like that:

<?php
$strings = [
    'http://example.com/wp-content/uploads/2017/01/image.jpg',
    'https://example.com/wp-content/uploads/2017/01/image2.jpg',
    'http://example.com/wp-content/plugins/example-plugin/images/image.jpg'
];
$pattern = '/(http[^s]).+(wp-content\/uploads/)(.+)/';
foreach ($strings as $subject) {
    if (preg_match($pattern, $subject, $matches)) {
        echo $matches[3] . "\n";
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Instead of using foreach and preg_match, you should consider to use preg_grep.
That is not what OP wants I do not want to capture the wp-content/uploads/
0

If you are just looking for a YES/NO then the following would work:

http[^s].*wp-content\/uploads\/

See https://regex101.com/r/kPxOMt/1 for an example.

If you're looking to capture part of the url, please tell me and I'll update the regex.

Comments

0

I do not want to capture the wp-content/uploads/

$res = '';
if (preg_match('~(http://.*?)wp-content/uploads/(\S*)~', $string, $m) {
    $res = $m[1] . $m[2];
}

2 Comments

According to OP These strings are present in post_content and other database columns on a WordPress installation this won't guarantee matching the right things.
@revo:Thanks for edit ;) OP needs to show real strings before giving a completly valid regex. This one works on actualy given examples.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.