0
<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)

Given that this is all on one line, how can I match or better yet extract all full urls and text? ie. for this example I wish to extract:

http://www.someurl(.+) . maybe some text here(.*) . www.someotherurl(.+) . maybe even more text(.*)

Basically, <Link.*:.* would start each link capture and > would end it. Then all text after the first capture would be captured as well up until zero or more occurrences of the next link capture.

I have tried:

preg_match_all('/<Link.*?:.*?(https|http|www)(.+?)>(.*?)/', $v1, $m4);

but I need a way to capture the text after the closing >. The problem is that there may or may not be another link after the first one (of course there could also be no links to begin with!).

1
  • It might be easier to try and preg_split using a pattern for a full URL Commented Dec 10, 2013 at 20:58

2 Answers 2

2
$string = "<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)";
$string = preg_split('~<link(?: to)?:\s*([^>]+)>~i',$string,-1,PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
echo "<pre>";
print_r($string);

output:

Array
(
    [0] => http://www.someurl(.+)
    [1] =>  maybe some text here(.*) 
    [2] => www.someotherurl(.+)
    [3] =>  maybe even more text(.*)
)
Sign up to request clarification or add additional context in comments.

Comments

0

You can use this pattern:

preg_match_all('~<link\b[^:]*:\s*\K(?<link>[^\s>]++)[^>]*>\s*(?<text>[^<]++)~',
               $txt, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
    printf("<br/>link: %s\n<br/>text: %s", $match['link'], $match['text']);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.