12

I have many strings (twitter tweets) from which I would like to remove the links when I echo them .

I have no control over the string and even though all the links start with http, they can end with a "/" or a ";" not, and be followed or not by a space. Also, sometimes there is not space between the link and the word just before it.

One example of such string:

The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge

I have try to play around with preg_replace, but couldn't come up with a solution that fit all the exceptions:

<?php echo preg_replace("/\http[^)]+\;/","",$feed->itemTitle); ?>

Any idea how I should proceed?

Edit: I have tried

<?php echo preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)‌​?)@', ' ', $feed->itemTitle); ?>

but still no success.

Edit 2: I found this one:

<?php echo preg_replace('^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-‌​\.\?\,\'\/\\\+&amp;%\$#_]*)?$^',' ', $feed->itemTitle); ?>

which remove the link as expected but it also deletes the entire string when there is not space between the link and the word that precedes it.

12
  • 1
    Related: What is the best regular expression to check if a string is a valid URL? Commented Jul 5, 2014 at 16:38
  • @DavidThomas Sorry: a typo! Thanks Theftprevention! Commented Jul 5, 2014 at 16:48
  • @gronostaj, Thanks for the link. My knowledge of Php if very limited and I am trying to find my way out of the most upvoted anser. Commented Jul 5, 2014 at 16:48
  • @Arone you don't need that PHP code, just the regex to match URLs. Commented Jul 5, 2014 at 16:50
  • 3
    this is the most common regex i've seen that may fit for you too: $feed->itemTitle = preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', ' ', $feed->itemTitle); Commented Jul 5, 2014 at 16:51

3 Answers 3

26

If you want to remove everything, link and after the link, like via thing in your example, the below may help you:

$string = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$regex = "@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?).*$)@";
echo preg_replace($regex, ' ', $string);

If you want to keep them:

$string = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$regex = "@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@";
echo preg_replace($regex, ' ', $string);
Sign up to request clarification or add additional context in comments.

1 Comment

Nowadays, some URLs look like this: https://www.i-cable.com/新聞資訊/134350/男護士索女網友性感照逼見面-威脅上傳討論區 which contains Chinese. The above Regex cannot handle it.
3

I would do something like this:

$input = "The Third Culture: The Frontline of Global Thinkinghttp://is.gd/qFioda;via @edge";
$replace = '"(https?://.*)(?=;)"';

$output = preg_replace($replace, '', $input);
print_r($output);

It works for multiple occurances too:

$output = preg_replace($replace, '', $input."\n".$input);
print_r($output);

1 Comment

thanks @jamb for your answer, however, sometimes the link doesn't end with ";" so I need to find a more global regex.
0

If your URL begins simply with www and no protocol, modify it like this to filter it:

$string = preg_replace('/\b((https?|ftp|file):\/\/|www\.)[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i', ' ', $string);

Credits: https://gist.github.com/madeinnordeste/e071857148084da94891

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.