0

I need some help with one think which I should to do. I have two arrays with urls for example:

$urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

I need to create three arrays of urls there. First of them will have common urls from both arrays. Two others will be the same only that if in this two initials array i have the same url but difference is only in http - https i need to assignet this url only to one array.

So from my example two arrays i need to get arrays in the following way:

 $commonUrls = ['https://test.com/']; //becouse i have only this url in two arrays


 $urls = ['http://example.com/', 'https://google.com/'];   //'http://example.com/ I leave in this array this url and remove from second table becouse in second array i have the same- difference is only in https


  $urlsFromOtherSource = ['https://facebook.com/']; //remove from this array https://example.com/ becouse this url is in first array- difference is only in http

I tried to think how can I compare this arrays and catch the difference in http-https but it is not easy for me. My code look like this:

  $urls = ['https://test.com/', 'http://example.com/', 'https://google.com/'];

$urlsFromOtherSource = ['https://test.com/', 'https://example.com/', 'https://facebook.com/'];

        $commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays
        $urls = array_diff($urls, $commonUrls);//I remove from this array urls which i have in common array
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $commonUrls);//I remove from this array urls which i have in common array


        foreach ($urlsFromOtherSource as $url) {
            $landingPageArray[] = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
        }

        foreach ($urls as $url) {
            $landingPage = preg_replace(["#^http(s)?://#", "#^www\.#"], ["", ""], $url);
            if (in_array($landingPage, $landingPageArray)) {
                $httpDifference[] = $url;
            }
        }
        //I havent idea how can I remove from $urlsFromOtherSource urls which I have in $urls array and where difference is only in http-https
        $urlsFromOtherSource = array_diff($urlsFromOtherSource, $httpDifference);

So all I need is compare arrays and remove from second array urls which I have in first array and difference between this url is only http-htpps. Maybe someone can help me find some algorithm for that.

UPDATE I need also remove from urlsFromOtherSource if I have this URL in commonUrls:

commonUrls: array(1) {
  [0]=>
  string(17) "http://www.test.com/"
}



urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "http://test.com/"
}

So I need remove from urlsFromOtherSource this URL. And make this code automatically compare only landing page whatever it is http://www or www or only http:// I need not compare this in my arrays

6
  • checkout array_diff() php.net/manual/en/function.array-diff.php Commented Sep 10, 2017 at 17:18
  • hi I know this function but this not help me. I need some algorithm which can hel me to compare similar URLs or delete similar URL from one array I think You not read all my post becouse this URLs is not the same i need check http-https status Commented Sep 10, 2017 at 17:20
  • I hope this helps: $commonUrls = array_intersect($urls,$urlsFromOtherSource); $urls = array_diff($urls, $commonUrls); $urlsFromOtherSource=array_diff($urlsFromOtherSource, $commonUrls); $urlsFromOtherSource=array_diff($urlsFromOtherSource, $urls) Commented Sep 10, 2017 at 18:25
  • ... if your problem is that 'http' and 'https' are the same for you, then you must, as you were doing, use preg_replace on every URL, e.g.: preg_replace('/http:/i', 'https:', $url); Commented Sep 10, 2017 at 18:41
  • arrat_diff will not work with that becouse I have example.com and example.com Commented Sep 10, 2017 at 18:46

1 Answer 1

2

You can write your own comparison function using the u-methods, like array_udiff and array_uintersect. Use preg_replace when comparing the urls to ignore the difference with http/https.

$commonUrls = array_intersect($urls, $urlsFromOtherSource);//here I have common urls from both arrays

$urls = array_diff($urls, $commonUrls);

$urlsFromOtherSource = array_udiff(array_diff($urlsFromOtherSource, $commonUrls), $urls, function ($a, $b) {
  return strcmp(preg_replace('|^https?://(www\\.)?|', '', $a), preg_replace('|^https?://(www\\.)?|', '', $b));
});

This yields:

commonUrls: array(1) {
  [0]=>
  string(17) "https://test.com/"
}

urls: array(2) {
  [1]=>
  string(19) "http://example.com/"
  [2]=>
  string(19) "https://google.com/"
}

urlsFromOtherSource: array(1) {
  [2]=>
  string(21) "https://facebook.com/"
}
Sign up to request clarification or add additional context in comments.

3 Comments

but please help me with my code after UPDATE in my first post I need to remove 'www' also I tried to combine with Your code but I failed.
Could You help me last time with that. I always need to check http and www and dont repeat URLs in any array
I updated my answer so it now ignores both http/https and www. when comparing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.