0

A string contain many urls, how to get urls start not with [url], and end not with [/url]

Example:

A string contain many urls: https://stackoverflow.com/1 [url]https://stackoverflow.com/2[/url] https://stackoverflow.com/3 [url]https://stackoverflow.com/4[/url], how to get match urls?

In this sample, need only return https://stackoverflow.com/1 and https://stackoverflow.com/3.

3
  • url3 ends with |/url] your question is it only for this example or is it in general ? Commented Jun 24, 2016 at 13:13
  • Only for this example Commented Jun 24, 2016 at 13:21
  • @KalaMei do all your URLs start with http? See my answer below if so. Commented Jun 24, 2016 at 14:54

4 Answers 4

1

I will underline only the regex expression as it is very important to get the urls. So it will be:

 (?!\[url\])\s+\bhttp:\/\/stackoverflow.com\/\d\s+(?<!\[\/url\])

you can see the result in this Url by using the php function preg_match_all

but before that let's understand every part of it (you can find this in the same site)

(?!\[url\])\s+\bhttp:\/\/stackoverflow.com\/\d\s+(?<!\[\/url\])
  • (?!\[url\]) Negative Lookahead - Assert that it is impossible to match the regex below

    • \[ matches the character [ literally
    • url matches the characters url literally (case insensitive)
    • \] matches the character ] literally

  • \s+ match any white space character [\r\n\t\f ] Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

  • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)

  • http: matches the characters http: literally (case sensitive)

  • \/ matches the character / literally

  • \/ matches the character / literally

  • stackoverflow matches the characters stackoverflow literally (case sensitive)

  • . matches any character (except newline)

  • com matches the characters com literally (case sensitive)

  • / matches the character / literally

  • \d match a digit [0-9]

  • \s+ match any white space character [\r\n\t\f ] Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

  • (?<!\[\/url\]) Negative Lookbehind - Assert that it is impossible to match the regex below

    • \[ matches the character [ literally
    • \/ matches the character / literally
    • url matches the characters url literally (case insensitive)
    • \] matches the character ] literally

Finally you need to use the php function as follow:

preg_match_all("(?!\[url\])\s+\bhttp:\/\/stackoverflow.com\/\d\s+(?<!\[\/url\])", $input_lines, $output_array);

$input_lines is the variable that holds your string

$output_array the arrays that holds the urls

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your reply, but this regex rule will return all urls (1, 2, 3, 4), you can see the test at regex101.com/r/tB5lY0/2 or sandbox.onlinephpfunctions.com/code/…
You need to use the negative lookbehind an lookahead ! I will edit my answer
0

(?<!\[url\])(?![^\s]+\[\/url\])http[^\s]*

This will grab the all the URLs not enclosed in the tags you mentioned ([url] and [\url]). Note that this works for every URL, not just the one you listed (i.e http://stackoverflow.com), which I think is what you want. You can see the explanations and live demo for each rule on Regex101 - Link: https://regex101.com/r/wN9aX0/3

2 Comments

This would also match http://stackoverflow.com/5[/url], i.e. if it doesn't start with [url] but ends with [/url] - not sure if this is OK with the OP.
@MariaDeleva Oops, that was my bad, my negative lookahead was incorrect, fixed now.
0

This is a little complicated pattern and probably won't work for all cases, but will work for most. If it doesn't work in a case you want it to work, I could tweak it further:

(?<!(\[url\]))[\s.:]((http|https)(:\/\/))?([[:alnum:]\-_]*)(([\.])([[:alnum:]\-_]*)){1,}([\/]([[:alnum:]\-_]*))*[.:;\s]((?!\[\/url\]))

Comments

0

This help you :

var patt =/(?:\bhttp:\/\/stackoverflow.com\/\d{1,})(?!\[\/url\])/;

Example :

<html>
<head></head>
    <body>
         <script>
             var patt =/(?:\bhttp:\/\/stackoverflow.com\/\d{1,})(?!\[\/url\])/;
             var str = "http://stackoverflow.com/2";
             if(patt.test(str))
                 alert("Valid");
             else
                 alert("Invalid");
        </script>
    </body>
</html>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.