1

How can the link from this string be removed

s=' hello how are you www.ford.com today '

so that the output is

s='hello how are you today'
2
  • 1
    Are you asking for a general solution or just for that string, because urls can be extremely diverse Commented Mar 31, 2016 at 2:24
  • a solution that can handle any sub-string in the form of www.something.com Commented Mar 31, 2016 at 2:26

4 Answers 4

7

Try the following list comprehension, which omits words of the pattern www._____.com:

' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com')) and len(item) > 7) #the len(item) is to make sure that words like www.com, which aren't real URLs, aren't removed

>>> s=' hello how are you www.ford.com today '
>>> ' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com') and len(item) > 7))
'hello how are you today'
>>> 
Sign up to request clarification or add additional context in comments.

6 Comments

That is a very elegant and readable solution
@Natecat not sure if you're being sarcastic :)
what if the string is s=' hello how are youwww.ford.comtoday ', and the words within the sting have no space between the link? @A.J.
@abcla My answer addresses this case
Wouldn't www.com, which is not a URL, get removed by this?
|
2

While you can certainly use strings methods, I prefer the regular expression based approach. It can handle spaces between words.

import re

s = " hello www.something.com there bobby"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello  there bobby
s = "hello www. begins and .com ends"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello www. begins and .com ends

3 Comments

This fails on the same condition as mine as Gerrat mentioned
@Natecat should work with spaces between the phrases now.
if s = "hello www. begins and .com ends",and I also want to print hello there bobby,so how can I remove single whitespace within the url link.
2

This seems like a good situation for a regex substitution.

>>> import re
>>> s = ' hello how are you www.ford.com today www.example.co.jp '
>>> re.sub(r'\s*(?:https?://)?www\.\S*\.[A-Za-z]{2,5}\s*', ' ', s).strip()
'hello how are you today'

The above finds any string that starts with potential whitespace, then possibly https:// or http://, then www., then any non-whitespace characters, then . followed by 2-5 alphabetical characters, then potential whitespace. It replaces such strings with a single space, and then removes leading and trailing whitespace from the result.

Note that this is a naive example of a URL, as defined by your specific example. See this answer for a regex with a more complete definition of what constitutes a URL.

Comments

0

In order to deal with the case where there is no space around the url, you can use the string split method like this:

if ".com" in s:
    s=''.join((s.split("www.")[0], " ", s.split(".com")[1]))

1 Comment

Your expression fails on sentences like: The prefix www. often starts a url, while .com ends it

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.