PHP Remove URL from string

Question

If I have a string that contains a url (for examples sake, we'll call it $url) such as;

$url = "Here is a funny site http://www.tunyurl.com/34934";

How do i remove the URL from the string? Difficulty is, urls might also show up without the http://, such as ;

$url = "Here is another funny site www.tinyurl.com/55555";

There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?

Are we talking about EXTRACTING urls from strings or removing the actual link itself? $url = "Here is another funny site www.tinyurl.com/55555"; (extracting) $url = "Here is another funny site www.tinyurl.com/55555"; and $someVar = 'www.tinyurl.com/55555'; (removing) $url = "Here is another funny site"; — tomzx
– tomzx, Commented Jul 11, 2009 at 17:56

Eddy · Accepted Answer · 2009-07-11 18:08:59Z

37

I re-read the question, here is a function that would work as intended:

function cleaner($url) {
  $U = explode(' ',$url);

  $W =array();
  foreach ($U as $k => $u) {
    if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
      unset($U[$k]);
      return cleaner( implode(' ',$U));
    }
  }
  return implode(' ',$U);
}

$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);

Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:

function containsTLD($string) {
  preg_match(
    "/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
    $string,
    $M);
  $has_tld = (count($M) > 0) ? true : false;
  return $has_tld;
}

function cleaner($url) {
  $U = explode(' ',$url);

  $W =array();
  foreach ($U as $k => $u) {
    if (stristr($u,".")) { //only preg_match if there is a dot    
      if (containsTLD($u) === true) {
      unset($U[$k]);
      return cleaner( implode(' ',$U));
    }      
    }
  }
  return implode(' ',$U);
}


$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);

returns:

Cleaned: Here is another funny site badurl.badone but this and and

edited Jul 11, 2009 at 18:08

answered Jul 11, 2009 at 15:03

Eddy

1,86212 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

mrpatg Over a year ago

Thanks for taking the time to expand on this.

Jay Brunet Over a year ago

Should "this and and" be blue? I don't want to edit actual output ;-)

Tyler V. Over a year ago

This won't catch URLs that are adjacent to other characters like quotes. So <a href="http://www.google.com"> won't be correctly filtered to remove the URL. You can use strip_tags, but if that's not what you want, you'll need to tweak this.

Guy Over a year ago

For some reason this breaks if there is a newline before the url. For example: "Some text and some more i.imgur.com/aaa.png" works fine but if there is a \n after the word 'more' (instead of the space) the result is "Some text and some". Any suggestions? Thanks!

Guy Over a year ago

I replaced $U = explode(' ',$url); with $U = preg_split('/\s+/', $url); and that seems to do the trick.

ronalchn · Accepted Answer · 2012-11-12 06:43:37Z

10

$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i', '', $string);

edited Nov 12, 2012 at 6:43

ronalchn

12.4k10 gold badges53 silver badges61 bronze badges

answered Sep 11, 2011 at 16:16

Mike

1011 silver badge2 bronze badges

Comments

hao · Accepted Answer · 2009-07-11 15:40:40Z

2

Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.

answered Jul 11, 2009 at 15:40

hao

10.3k1 gold badge38 silver badges50 bronze badges

Comments

Ryan Harne · Accepted Answer · 2012-07-24 05:26:21Z

0

thanks mike,

update a bit, it return notice error,

'/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i'

$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i', '', $string);

answered Jul 24, 2012 at 5:26

Ryan Harne

4793 silver badges7 bronze badges

Comments

Brian Hollon · Accepted Answer · 2015-09-25 16:19:06Z

0

$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';

$clean_url = clean($url,$replace,$with);
echo $clean_url;

function clean($url,$replace,$with) {   

  $replace = explode(" ",$replace);
  $new_string = '';
  $check = explode(" ",$url);

  foreach($check AS $key => $value) {
     foreach($replace AS $key2 => $value2 ) {
        if (-1 < strpos( strtolower($value), strtolower($value2) )  ) {
            $value = $with;
            break;
        }
     }
    $new_string .= " ".$value;
  }
 return $new_string;
}

answered Sep 25, 2015 at 16:19

Brian Hollon

1

1 Comment

Bono Over a year ago

Could you provide an explenation with your code? It might help OP or future users more.

Ali · Accepted Answer · 2009-07-11 14:39:53Z

-3

You would need to write a regular expression to extract out the urls.

answered Jul 11, 2009 at 14:39

Ali

268k270 gold badges593 silver badges787 bronze badges

Collectives™ on Stack Overflow

PHP Remove URL from string

6 Answers 6

5 Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related