0

I wrote this function to convert all specific URLs(mywebsite.com) to links, and strip other URLs to @@@spam@@@.

function get_global_convert_all_urls($content) {
  $content = strtolower($content);
  $replace = "/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?/im";
  preg_match_all($replace, $content, $search);
  $total = count($search[0]);
  for($i=0; $i < $total; $i++) {
  $url = $search[0][$i];
    if(preg_match('/mywebsite.com/i', $url)) {
      $content = str_replace($url, '<a href="'.$url.'">'.$url.'</a>', $content);            
    } else {
      $content = str_replace($url, '@@@spam@@@', $content); 
    }
  } 

  return $content;
}

The only problem that i can't solve is, the regex not ending on space if 2 URLs in one line.

$content = "http://www.mywebsite.com/index.html http://www.others.com/index.html";

Result:

<a href="http://www.mywebsite.com/index.html http://www.others.com/index.html">http://www.mywebsite.com/index.html http://www.others.com/index.html</a>

How can i get the result below:

<a href="http://www.mywebsite.com/index.html">http://www.mywebsite.com/index.html</a> @@@spam@@@   

I have tried add this (\s|$) at the ending of regex but no luck:

/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+\.[A-Za-z]+)(?:\/.*)?(\s|$)/im
3
  • I think the link above href="http://www.http://www.mywebsite.com" is also incorrect Commented Mar 26, 2016 at 9:51
  • strange ... I'm receiving this result '<a href="http://www.http://www.mywebsite.com">http://www.mywebsite.com</a> @@@spam@@@' using your current regex Commented Mar 26, 2016 at 9:53
  • @RomanPerekhrest Oppsss...Sorry, please try add /index.html Commented Mar 26, 2016 at 10:04

3 Answers 3

1

Edited based on change in your question.

The problem is your .* at the end of your regex, so my suggestion is to replace it with a more precise expression. I cooked this up real quick, you'll want to some tests to verify your cases. =)

$matches = null;
$returnValue = preg_match_all('!(?:http|https)?(?:\\:\\/\\/)?(?:www.)?(([A-Za-z0-9-]+\\.)*[A-Za-z0-9-]+\\.[A-Za-z]+)(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\\-\\._\\?\\,\\\'/\\\\\\+&%\\$#\\=~])*[^\\.\\,\\)\\(]!', 'mywebsite.com/index.html others.com/index.html', $matches);

Results in:

array (
  0 => 
  array (
    0 => 'mywebsite.com/index.html ',
    1 => 'others.com/index.html',
  ),
  1 => 
  array (
    0 => 'mywebsite.com',
    1 => 'others.com',
  ),
  2 => 
  array (
    0 => '',
    1 => '',
  ),
  3 => 
  array (
    0 => '',
    1 => '',
  ),
  4 => 
  array (
    0 => 'l',
    1 => 'm',
  ),
)
Sign up to request clarification or add additional context in comments.

1 Comment

Oppsss...Sorry, please try add /index.html on both URLs.
1

Change the last element of the regex (?:\/.*)? into \S*.

Your regex matches every character till the end of the string including spaces, \S* matches every character that is not a space.

You could also simplified the whole regex into:

$replace = "~(?:https?://)?(?:www\.)?(([A-Z0-9-]+\.)*[A-Z0-9-]+\.[A-Z]+)\S*~im";

Comments

1

Change the regexp pattern to capture the last url section(/index.html, /index.php).

/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+?\.)?[A-Za-z0-9-]+?\.?[A-Za-z]*?(\/\w+?\.\w+?)?)\b/im

Change your function content as shown below:

$content = "http://www.mywebsite.com/index.html http://www.others.com/index.html";

function get_global_convert_all_urls($content) {
  $content = strtolower($content);
  $replace = "/(?:http|https)?(?:\:\/\/)?(?:www.)?(([A-Za-z0-9-]+?\.)?[A-Za-z0-9-]+?\.?[A-Za-z]*?(\/\w+?\.\w+?)?)\b/im";
  preg_match_all($replace, $content, $search);

  foreach ($search[0] as $url) {
    if(preg_match('/mywebsite.com/i', $url)) {
      $content = str_replace($url, '<a href="'.$url.'">'.$url.'</a>', $content);         
    } else {
      $content = str_replace($url, '@@@spam@@@', $content); 
    }
  } 

  return $content;
}

var_dump(get_global_convert_all_urls($content)); 

The output:

string '<a href="http://www.mywebsite.com/index.html">http://www.mywebsite.com/index.html</a> @@@spam@@@'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.