python, remove link from string [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 9 years ago.

Improve this question

How can the link from this string be removed

s=' hello how are you www.ford.com today '

so that the output is

s='hello how are you today'

Are you asking for a general solution or just for that string, because urls can be extremely diverse — Natecat
– Natecat, Commented Mar 31, 2016 at 2:24
a solution that can handle any sub-string in the form of www.something.com — Mustard Tiger
– Mustard Tiger, Commented Mar 31, 2016 at 2:26

A.J. Uppal · Accepted Answer · 2016-03-31 02:45:46Z

7

Try the following list comprehension, which omits words of the pattern www._____.com:

' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com')) and len(item) > 7) #the len(item) is to make sure that words like www.com, which aren't real URLs, aren't removed

>>> s=' hello how are you www.ford.com today '
>>> ' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com') and len(item) > 7))
'hello how are you today'
>>>

edited Mar 31, 2016 at 2:45

answered Mar 31, 2016 at 2:27

A.J. Uppal

19.3k7 gold badges47 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Natecat Over a year ago

That is a very elegant and readable solution

A.J. Uppal Over a year ago

@Natecat not sure if you're being sarcastic :)

Mustard Tiger Over a year ago

what if the string is s=' hello how are youwww.ford.comtoday ', and the words within the sting have no space between the link? @A.J.

Natecat Over a year ago

@abcla My answer addresses this case

TigerhawkT3 Over a year ago

Wouldn't www.com, which is not a URL, get removed by this?

|

Ben · Accepted Answer · 2016-03-31 02:48:48Z

2

While you can certainly use strings methods, I prefer the regular expression based approach. It can handle spaces between words.

import re

s = " hello www.something.com there bobby"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello  there bobby
s = "hello www. begins and .com ends"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello www. begins and .com ends

edited Mar 31, 2016 at 2:48

answered Mar 31, 2016 at 2:40

Ben

6,4834 gold badges38 silver badges46 bronze badges

3 Comments

Natecat Over a year ago

This fails on the same condition as mine as Gerrat mentioned

Ben Over a year ago

@Natecat should work with spaces between the phrases now.

user3849475 Over a year ago

if s = "hello www. begins and .com ends",and I also want to print hello there bobby,so how can I remove single whitespace within the url link.

Community · Accepted Answer · 2017-05-23 11:59:37Z

2

This seems like a good situation for a regex substitution.

>>> import re
>>> s = ' hello how are you www.ford.com today www.example.co.jp '
>>> re.sub(r'\s*(?:https?://)?www\.\S*\.[A-Za-z]{2,5}\s*', ' ', s).strip()
'hello how are you today'

The above finds any string that starts with potential whitespace, then possibly https:// or http://, then www., then any non-whitespace characters, then . followed by 2-5 alphabetical characters, then potential whitespace. It replaces such strings with a single space, and then removes leading and trailing whitespace from the result.

Note that this is a naive example of a URL, as defined by your specific example. See this answer for a regex with a more complete definition of what constitutes a URL.

edited May 23, 2017 at 11:59

CommunityBot

11 silver badge

answered Mar 31, 2016 at 2:49

TigerhawkT3

49.5k6 gold badges65 silver badges101 bronze badges

Comments

Natecat · Accepted Answer · 2016-03-31 02:36:11Z

0

In order to deal with the case where there is no space around the url, you can use the string split method like this:

if ".com" in s:
    s=''.join((s.split("www.")[0], " ", s.split(".com")[1]))

answered Mar 31, 2016 at 2:36

Natecat

2,1931 gold badge17 silver badges20 bronze badges

1 Comment

Gerrat Over a year ago

Your expression fails on sentences like: The prefix www. often starts a url, while .com ends it

Collectives™ on Stack Overflow

python, remove link from string [closed]

4 Answers 4

6 Comments

3 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

3 Comments

Comments

1 Comment

Linked

Related