-1

Being driven up the wall here. I've looked at other posts where there's a regex negative lookahead but for some reason I can't get it to work. Probably missing something very easy but after a couple of hours trying various options I need help!

So, I'm trying to find a pattern for preg_replace which searches through code for href links which IGNORES any containing a particular domain AND also IGNORES any which include a js reference called data-fancybox.

In the following it must ignore the first 3 which contain data-fancybox and also ignore #4 the youtube link. It should only find the last 2.

  1. <a href="https://youtube.com/" data-fancybox>
  2. <a href="https://example.com/" data-fancybox>
  3. <a href="https://vimeo.com/" data-fancybox>
  4. <a href="https://youtube.com/">
  5. <a href="https://example.com/">
  6. <a href="https://vimeo.com/">

When I try this:

<a href=.*((youtube|data-fancybox)).*>

It picks out the first 4 and ignores the last two. But when I try to turn this negative so it only picks out the last 2, it ends up picking them all out:

<a href=.*(?!(youtube|data-fancybox)).*>

Any help appreciated!!

3
  • why not ? if (!preg_match($pattern,$string)): Commented Apr 18, 2020 at 14:19
  • maybe this could help regular-expressions.info/lookaround.html Commented Apr 18, 2020 at 15:40
  • I can't user '? if (!preg_match($pattern,$string)):' because the preg_match needs to go through a large block of HTML text in which are various href links; it needs to pick out certain links and adjust them before returning the block of HTML Commented Apr 19, 2020 at 17:40

1 Answer 1

0

You should specify all characters before of occurrence of the text that you are interested youtube.com for instance. To detect links that are not youtube.com

$pattern = '/(<a href=\"https:\/\/(?!youtube.com).*)/';
$string[] = '<a href="https://youtube.com/" data-fancybox>';
$string[] = '<a href="https://example.com/" data-fancybox>';
$string[] = '<a href="https://vimeo.com/" data-fancybox>';
$string[] = '<a href="https://youtube.com/">';
$string[] = '<a href="https://example.com/">';
$string[] = '<a href="https://vimeo.com/">';
foreach ($string as $key=>$str)
{
    if (preg_match($pattern, $str, $matches))
        echo "$key valid<BR>";
    else
        echo "$key not valid<BR>";
}

To detect link that are not data-fancybox

$pattern = '/(.*\" (?!data-fancybox)|.*\">)/';
Sign up to request clarification or add additional context in comments.

14 Comments

That's a step closer; but does this mean it's not possible though to have a pattern which negates both 'youtube' OR 'data-fancybox' then as we can't specify the string which occurs before 'data-fancybox'?
when you write something like q(?!youtube.com) you are asking for a q character that is not followed by youtube.com. As far as I know, you always needs to specify this character, in this case q. The problem with your pattern is that you don' specify a character you simple say .* there are several characters that are not followed by youtube.com that is the reason all sentences are valid
to detect data-fancibox you could try $pattern = '/.*\" (data-fancybox).*/';
to detect with no data-fancibox: $pattern = '/(.*\" (?!data-fancybox)|.*\">)/';
doing an OR with both results: $pattern = '/<a href=\"https:\/\/(?!youtube.com).*|(.*\" (?!data-fancybox)|.*\">)/';
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.