2

I have got many topics on extracting all urls from a string and detecting urls with specific pattern. But not both. Sorry I am a bit rough in regex. Can someone please help.

Here is what I want:

$str = <<<EOF
  This string is valid - http://example.com/products/1
  This string is not valid - http://example.com/order/1
EOF;

Basically I want to extract all urls inside the $str variable which has a patter with /products/

I tried this for the url extraction - /\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i but along with this I only want those having that pattern and not the others.

4
  • 1
    You are not matching /products/ so you could add it right? \b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*\/products\/[-a-z0-9+&@#\/%=~_|] Commented Feb 21, 2022 at 13:02
  • Yes, but it only extracts a single character after the match. So, if my string is http://example.com/products/1/abc it just pulls upto the 1 and not the entire url. Commented Feb 21, 2022 at 13:12
  • 1
    Then you can add the optional character class after it as well \b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*\/products\/[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|] regex101.com/r/KatX8u/1 Commented Feb 21, 2022 at 13:14
  • 1
    Excellent, works like a charm. This was exactly what I was looking for. Thanks a lot! Commented Feb 21, 2022 at 13:25

2 Answers 2

2

You can repeat all the allowed characters before and after matching /products/ using the same optional character class. As the character class is quite long, you could shorten the notation by wrapping it in a capture group and recurse the first subpattern as (?1)

Note that you don't have to escape the forward slash using a different separator.

$re = '`\b(?:(?:https?|ftp)://|www\.)([-a-z0-9+&@#/%?=~_|!:,.;]*)/products/(?1)[-a-z0-9+&@#/%=~_|]`';

$str = <<<EOF
  http://example.com/products/1/abc
  This string is valid - http://example.com/products/1
  This string is not valid - http://example.com/order/1
EOF;

preg_match_all($re, $str, $matches);
print_r($matches[0]);

Output

Array
(
    [0] => http://example.com/products/1/abc
    [1] => http://example.com/products/1
)
Sign up to request clarification or add additional context in comments.

2 Comments

I am not sure about this particular option @The fourth bird. I tried this. It pulls the urls correct, but does not extract them whole. Am I doing something wrong here - regex101.com/r/z06ETA/1 ?
@PratipGhosh It matches the same as the full pattern that is in the comments. See 3v4l.org/iACY8
1

Beside the answer from "The fourth bird" I am proposing another hybrid solution which is using both regex and classic string operations to provide a helper function with some additional options e.g. to get different results in runtime without changing the RE

<?php

function GetURL($str, $pattern='/products/')
{
    $temp = array();
    preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $str, $match);
    foreach ($match[0] as $link)
    {
        if(!$pattern)
            array_push($temp, $link);
        else if(strpos($link, $pattern) !== false)
            array_push($temp, $link);
    }
    return $temp;
}

$str = <<<EOF
  This string is valid - http://example.com/products/1
  This string is not valid - http://example.com/order/1
EOF;

print_r(GetURL($str)); //Urls only with /products/ inside
print_r(GetURL($str, '/order/')); //Urls only with /order/ inside
print_r(GetURL($str, false)); //All urls

?>

OUTPUT

Array ( [0] => http://example.com/products/1 ) 
Array ( [0] => http://example.com/order/1 ) 
Array ( 
   [0] => http://example.com/products/1 
   [1] => http://example.com/order/1 
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.