PHP pattern search and replace

Question

I have a bunch of rawr contents in database.

some containing string http://www.example.com/subfolder/name.pdf or /subfolder/name.pdf

I need a pattern replace on these to turn them into /wp-content/uploads/old/subfolder/name.pdf there can be many levels of subfolders! /subfolder1/subfolder2/subfolder3/file.pdf

The pattern for finding I use is

/http[^\s]+pdf/
/href="\/[^\s]+pdf/

But how to replace the pattern with another pattern? ( the example above ^ )

I have

search for /http:\/\/www.example.com(.*).pdf"/
replace with /wp-content/uploads/old$1.pdf"

search for /href="\/pdf(.*)\.pdf">/

this works fine until there are more than 1 pdf links in one table cell

example

<a href="/pdf/subdir/name.pdf">clickhere</a><a href="/pdf/subdir/name.pdf">2nd PDF</a>

What have you tried? What are the conditions for the match? Are there any exceptions? Please include examples for both — Mariano
– Mariano, Commented Sep 16, 2015 at 17:28
Are you aware your regex matches "xxxhttpxxxpdfxxxx.html"? — Mariano
– Mariano, Commented Sep 16, 2015 at 17:35
Which database do you use? Regex replacement functions are available in oracle and by some user defined functions in mysql. A preg_replace code for this would be $out = preg_replace('&^(http://www.example.com/)(.*[.]pdf)$&', '$1wp-content/uploads/$2', $in); and the other likewise (if the URL is fixed; replace it by a pattern like [^/]+ if not) — syck
– syck, Commented Sep 16, 2015 at 17:36

Community · Accepted Answer · 2017-05-23 12:22:23Z

this works fine until there are more than 1 pdf links in one table cell

The regex engine is greedy by default, and it consumes as much as it can attempting a match. In order to reverse this behaviour, you could use a lazy quantifier, as explained in this post: Greedy vs. Reluctant vs. Possessive Quantifiers. So you have to add an extra ? after a quantifier to attempt a match with as less as it can consume. To make your greedy construct lazy, use [^\s]+?.

some containing string http://www.example.com/subfolder/name.pdf or /subfolder/name.pdf

But how to replace the pattern with another pattern?

As you can see, "http://www.example.com" is optional. You can make a part of your pattern optional with a (?:group) and a ? quantifier.

Pattern with an optional group:

(?:http://www\.example\.com)?/(\S+?)\.pdf

Don't forget to escape the dots, as they have a special meaning in regex.
Notice I used \S (capital "S") instead of [^\s] (they are both exactly the same).

One more thing, you may consider adding some boundaries in your pattern. I suggest using (?<!\w) (not preceded by a word character) and \b a word boundary to avoid a match as part of another word (as I commented in your question).

Regex:

(?<!\w)(?:http://www\.example\.com)?/(\S+?)\.pdf\b

Code:

$re = "@(?<!\\w)(?:http://www\\.example\\.com)?/(\\S+?)\\.pdf\\b@i"; 
$str = "some containing string http://www.example.com/subfolder/name.pdf
        or /subfolder/name.pdf
        <a href=\"/pdf/subdir/name.pdf\">clickhere</a>
        <a href=\"/pdf/subdir/name.pdf\">2nd PDF</a>"; 
$subst = "/wp-content/uploads/old/$1.pdf"; 

$result = preg_replace($re, $subst, $str);

Test in regex101

Mark · Accepted Answer · 2015-09-16 17:36:40Z

0

A sandbox example here: http://sandbox.onlinephpfunctions.com/code/cc47b98d16981b786cf2d573751b6a09a9725b90

$array = [
     "https://test.com/url/subfolder/subfolder/file.pdf",
     "https://test.com/url/subfolder1/subfolder/file.pdf",
     "/url/subfolder3/subfolder3/files.xml",
     "/url/subfolder/subfolder/file.pdf"
];

function setwpUrl($urls, $prepend) {
    for($i = 0; $i < count($urls); $i++) {
        preg_match_all("/(https?:\/\/[a-zA-Z0-9\.\-]+)?(.*)/", $urls[$i], $out);
        $urls[$i] = $prepend . $out[2][0];
    }
    return $urls;
}

$newUrls = setwpUrl($array, "/wp-content/uploads/old");

var_dump($newUrls);

answered Sep 16, 2015 at 17:36

Mark

3,2922 gold badges24 silver badges31 bronze badges

Collectives™ on Stack Overflow

PHP pattern search and replace

2 Answers 2

Regex:

Code:

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Regex:

Code:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related