0

I have a bunch of rawr contents in database.

some containing string http://www.example.com/subfolder/name.pdf or /subfolder/name.pdf

I need a pattern replace on these to turn them into /wp-content/uploads/old/subfolder/name.pdf there can be many levels of subfolders! /subfolder1/subfolder2/subfolder3/file.pdf

The pattern for finding I use is

/http[^\s]+pdf/
/href="\/[^\s]+pdf/

But how to replace the pattern with another pattern? ( the example above ^ )

I have

search for /http:\/\/www.example.com(.*).pdf"/
replace with /wp-content/uploads/old$1.pdf"

search for /href="\/pdf(.*)\.pdf">/

this works fine until there are more than 1 pdf links in one table cell

example

<a href="/pdf/subdir/name.pdf">clickhere</a><a href="/pdf/subdir/name.pdf">2nd PDF</a>

4
  • What have you tried? What are the conditions for the match? Are there any exceptions? Please include examples for both Commented Sep 16, 2015 at 17:28
  • Are you aware your regex matches "xxxhttpxxxpdfxxxx.html"? Commented Sep 16, 2015 at 17:35
  • Which database do you use? Regex replacement functions are available in oracle and by some user defined functions in mysql. A preg_replace code for this would be $out = preg_replace('&^(http://www.example.com/)(.*[.]pdf)$&', '$1wp-content/uploads/$2', $in); and the other likewise (if the URL is fixed; replace it by a pattern like [^/]+ if not) Commented Sep 16, 2015 at 17:36
  • updated with what I have Commented Sep 16, 2015 at 17:38

2 Answers 2

1

this works fine until there are more than 1 pdf links in one table cell

The regex engine is greedy by default, and it consumes as much as it can attempting a match. In order to reverse this behaviour, you could use a lazy quantifier, as explained in this post: Greedy vs. Reluctant vs. Possessive Quantifiers. So you have to add an extra ? after a quantifier to attempt a match with as less as it can consume. To make your greedy construct lazy, use [^\s]+?.

some containing string http://www.example.com/subfolder/name.pdf or /subfolder/name.pdf

But how to replace the pattern with another pattern?

As you can see, "http://www.example.com" is optional. You can make a part of your pattern optional with a (?:group) and a ? quantifier.

Pattern with an optional group:

(?:http://www\.example\.com)?/(\S+?)\.pdf
  • Don't forget to escape the dots, as they have a special meaning in regex.
  • Notice I used \S (capital "S") instead of [^\s] (they are both exactly the same).


One more thing, you may consider adding some boundaries in your pattern. I suggest using (?<!\w) (not preceded by a word character) and \b a word boundary to avoid a match as part of another word (as I commented in your question).

Regex:

(?<!\w)(?:http://www\.example\.com)?/(\S+?)\.pdf\b

Code:

$re = "@(?<!\\w)(?:http://www\\.example\\.com)?/(\\S+?)\\.pdf\\b@i"; 
$str = "some containing string http://www.example.com/subfolder/name.pdf
        or /subfolder/name.pdf
        <a href=\"/pdf/subdir/name.pdf\">clickhere</a>
        <a href=\"/pdf/subdir/name.pdf\">2nd PDF</a>"; 
$subst = "/wp-content/uploads/old/$1.pdf"; 

$result = preg_replace($re, $subst, $str);

Test in regex101

Sign up to request clarification or add additional context in comments.

Comments

0

A sandbox example here: http://sandbox.onlinephpfunctions.com/code/cc47b98d16981b786cf2d573751b6a09a9725b90

$array = [
     "https://test.com/url/subfolder/subfolder/file.pdf",
     "https://test.com/url/subfolder1/subfolder/file.pdf",
     "/url/subfolder3/subfolder3/files.xml",
     "/url/subfolder/subfolder/file.pdf"
];

function setwpUrl($urls, $prepend) {
    for($i = 0; $i < count($urls); $i++) {
        preg_match_all("/(https?:\/\/[a-zA-Z0-9\.\-]+)?(.*)/", $urls[$i], $out);
        $urls[$i] = $prepend . $out[2][0];
    }
    return $urls;
}

$newUrls = setwpUrl($array, "/wp-content/uploads/old");

var_dump($newUrls);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.